SlideShare a Scribd company logo
1 of 91
Download to read offline
event
          event                             event
                  event            event

event
        event

                event
                        LIFE       event
                                           event
                                                    event
                        event
    event                                  event
                           event
                   event
Time      Event type

 ( 7:00 am, Wake up )


                                      event
          event                             event
                  event            event

event
        event

                event
                        LIFE       event
                                           event
                                                    event
                        event
    event                                  event
                           event
                   event
event
          event                             event
                  event            event

event
        event

                event
                        LIFE       event
                                           event
                                                    event
                        event
    event                                  event
                           event
                   event

                                     “Event Sequence”
Daily Activities

7:00/W!"# $p       7:15/S%&w#r    8:00/Br#!"f!'(
Student Progress

A$)’07/E*(#r   M!+’09/M!'(#r   Apr’12/D#f#*'#
Event Sequences

 Medical    Transportation


 Sports     Education


 Web logs   Logistics



                  and more…
Two interesting problems
1. Lack of overview

  Show overview
  or summary




                          60,041 patients
                          203,214 traffic incidents
Where should I start?
Is the dataset cleaned?   7,022 web sessions
                                            … and more
2. Approximate search

                      ICU       Floor    ICU



                      QUERY        within 2 days

Find something
useful and display.




                      RESULTS                      Frustrated!
                      Found 0 record
Research Questions


    Overview                           Search
How to provide an overview          How to support users
of multiple event sequences?      when they are uncertain
                               about what they are looking for?


    LifeFlow
                                        Similan
                          Flexible Temporal Search
Outline
                   Approximate
Introduction         Search                Conclusions

           LifeFlow           Case Studies
           Overview

                  How to provide an overview
                  of multiple event sequences?
From one event sequence...
•  Single record              [Cousins91], [Harrison94], [Plaisant98], …


     Patient ID: 45851737
   12/02/2008&14:26 &Arrival&
   12/02/2008&14:26 &Emergency&
   12/02/2008&22:44 &ICU&
   12/05/2008&05:07 &Floor&
   12/08/2008&10:02 &Floor&
   12/14/2008&06:19 &Discharge&
   &                                                         Time

                            Patient #45851737      Arrival
                                                    Emergency Room
                                                        ICU
                                                                     Floor
                                                                             Discharge


               compact
To multiple event sequences...
•  Search   [Fails06], [Wang08], [Vrotsou09], …
To multiple event sequences...
•  Search   [Fails06], [Wang08], [Vrotsou09], …




•  Group    [Phan07], [Burch08], [Wang09], …


    1 {
    2 {
but…
Summarize
e.g.   1) What happened to the patients after they arrived?


                             Arrival!
                                            ?
                                        ?
       2) What happened to the patients before & after ICU?

                              ICU!

                 ?                          ?
                       ?                ?
Overview / Summary




    Millions of records!
Challenges

                       Squeeze into one screen




                          AGGREGATE

                                                 Screen

Millions of records

                       Preserve information!
1
                                     #




   LifeFlow
scalable & novel overview

summarizes all possible sequences!
    & gaps between events!
Demo
LifeFlow Design
1
                                                                              #
                                       time

#1&
                                                     Event Sequences
#2&
                                                         n records
#3&
…&
                                                        1,000,000

Aggregate                                                          O(n)

                                                     Tree of Sequences
                                                     α" No. of patterns
                                                          9 nodes
Represent

                                       time
  records




                                                          LifeFlow
                                                     Visual Representation
                                                     Space-filling technique
            Average time   Event Bar      End Node
Demo
LifeFlow
User Study


xxxxx                  12-minute


yyyyy
10 participants
                        training


                                        15 tasks



          Participants could perform the tasks
                 accurately and rapidly.
Quotes

       “ Oh! This is very cool! ”

 “ Theunderstand
easy to
        tool is        “ LifeFlow provides
                         a great summary
 and easy to use.! ”     of the big picture.!   ”
“ find common
   Very easy to
                         “ Can I use it
 and uncommon
   sequences!
                         with my dataset?   ”
               ”
wait for the case studies :)
Outline
                   How to support users when they are uncertain
                   about what they are looking for?


                       Approximate
Introduction             Search                Conclusions

               LifeFlow            Case Studies
               Overview


         Similarity Search       Hybrid Search
Related Work: Exact Match
 Exact Match         •  Event Sequence
 MUST have A, B, C     –  TimeSearcher
                           [Hochheiser04]
      Query            –  PatternFinder
                           [Fails06]
                       –  LifeLines2
     Record#1
                           [Wang08]
                       –  ActiviTree
     Record#2              [Vrotsou09]
                       –  QueryMarvel
     Record#3              [Jin09]
Related Work: Similarity Search

•  Image             Similarity Search
   [Kato92]          SHOULD have A, B, C
•  Stock Price
   [Wattenberg01]          Query        more"
                                       similar!
•  Web page
   [Watai07]              Record#2   0.91
•  Bank account
   [Chang07]              Record#1   0.83
•  Event Sequence?
                          Record#3   0.70
Challenges
What is similar?
                                    depends on users/tasks

 Query    Record #1
                      A!                        B!   C!

          Record #2              missing
                      A!                        B!   C!

          Record #3                                              extra
                      A!                        B!   C!    D!

          Record #4
                      A!   B! time difference        C!

          Record #5                                       swap
                      A!                        C!   B!
Match & Mismatch (M&M) Measure
                                   Time

Query      Record #1
                            A!     C!     B!         D!



           Record #2
                       A!          B!     C!                  E!


             Matched events                Missing        Extra




                                               }
        Time difference
        Number of swap                               Total Score
        Number of missing events                      0.00-1.00
        Number of extra events
2
                                                                        #


               Similarity Search

Similarity Measure
 Match & Mismatch           +         User Interface
                                              Similan
    What is similar?!                Specify query / Display results!


                         Version 1

                        xxxxyyyy
                         Version 2
Screenshot
  Similan
Controlled Experiment
Exact Match            Similarity Search
  LifeLines2                     Similan




          xxxxxxxxx
          xxxxyyyyy
               18 participants
Lessons
            Exact Match   Similarity Search

 Counting                                       Similar
Confidence                                      Flexible
                                              Uncertainty
               accept
               reject
Combination
Exact Match + Similarity Search   =   Hybrid




   accept
   reject                                 accept
                                          reject
3
                                                     #
Flexible Temporal Search (FTS)

  “mandatory”
                                        Results
                               Begin




Query
 Constraint #1     PASS FAIL
  Constraint #2                             accept
   Constraint #3
    mandatory                               reject

    optional


                               Reject
3
                                            #
Flexible Temporal Search (FTS)

  “optional”
                               Results


Query
 Constraint #1     PASS FAIL
  Constraint #2                    accept
   Constraint #3
    mandatory                      reject

    optional
mandatory
Constraints
•  Event
              A!       B!       C!

                   Aug 14, 2000
•  Timing
              A!


•  Negation
              A!                C!
                       B!

•  Gap
              A!    1-2 days!   C!
optional
Constraints
•  Event
              A!       B!       C!

                   Aug 14, 2000
•  Timing
              A!


•  Negation
              A!                C!
                       B!

•  Gap
              A!    1-2 days!   C!
FTS Matching

                              Time

Query
                    A!   B!      C!         D!
                                                      E!


        Record #2
                         A!           B!   D!    C!
FTS Matching(2)
                                            i

            Query
                           A!     B!        C!         D!         E!

                s(0,0)   s(1,0)
    Record #2

                s(0,1)
       A!
                                       Dynamic programming



                                  {
       B!
j                                       s(i-1, j)   + skip( query[i] )

       D!         s(i, j) = max         s(i, j-1)   + skip( events[j] )
                                        s(i-1, j-1) + match( query[i], events[j] )
       C!
Similarity Vector s(i,j)
•    No. of matched events (mandatory)
•    No. of matched events (optional)
•    No. of negations violated (optional)
•    No. of negations violated (mandatory)
•    No. of time constraints violated
•    Time difference
•    No. of extra events
     –  Extra before the first match
     –  Extra between the first and last match
     –  Extra after the last match
(Flexible Temporal Search)

 Query              FTS                  Record#1




 Grade                   Similarity Score
Pass/Fail                     0-100
                          1.    Missing events
                          2.    Extra events
                          3.    Negation violations
                          4.    Time difference
Demo
Flexible Temporal Search (FTS)
Outline
                      Approximate
Introduction            Search           Conclusions

               LifeFlow        Case Studies
               Overview


                 Multi-dimensional In-depth
                 Long-term Case Studies (MILCs)
“to the wild”
MILCs
# Domain                Data Size Duration

1   Medical                7,041        7 months
2 Transportation          203,214       3 months
3 Medical                 20,000        6 months
4 Medical                 60,041         1 year
5 Web logs                 7,022        6 weeks
6 Activity logs             60          5 months
7 Logistics                 821         6 weeks
8 Sports                     61         5 weeks

           8 case studies / 6 domains
Case #1: Medical

User:   Dr. A. Zach Hettinger
        MedStar Institute for Innovation
        mi2.org


Data:   60,041 patients

Task:   Hospital readmissions
Current Report
Patient   Diagnosis    Visit Date     Physician    Visit Date      Physician
                       #1             #1           #2              #2

Mr. X     Back pain    Jun 10, 2010   Dr. Jones    Jun 29, 2010    Dr. Brown

Mr. Y     Chest pain   Jun 11, 2010   Dr. Jones    Jun 20, 2010    Dr. Jones
…         …            …              …            …               …




          An example of current report used in a hospital (fake data)


                How many patients came back?
          Did they come back for the 3rd, 4th, … time?
                How many came back and died?
                              …
60,041 patients         How many patients came back?
                  Did they come back for the 3rd, 4th, … time?
   Registration
60,041 patients

   Registration   How many came back and died?

   Death
60,041 patients
                  Location
   Registration
   Admission
   Death
60,041 patients                   Find a pattern:
                  Registration > Discharge > Registration > Death
   Registration
   Discharge
   Death
60,041 patients                   Find a pattern:
                  Registration > Discharge > Registration > Death
   Registration
   Discharge
   Death
Analyzing data in a new way
   Personal exploration
  Long-term monitoring
     Save more lives!
Case #2: Transportation

User:   CATT Lab at the University of Maryland
        www.cattlab.umd.edu


Data:   203,214 traffic incidents

Task:   Comparing traffic agencies’ performance
100 Years!
Clean the data!
Video
Suspicious distribution!
Detect anomalies
   Clean data
 Large dataset
Case #3: Web logs

User:   Anne Rose
        International Children’s Digital Library
        www.childrenslibrary.org


Data:   7,022 sessions

Task:   How do people read children books online?




  PAGE 1        PAGE 2         PAGE 3         …
~5 MINUTES
24 SECONDS
Understand data
Surprising pattern
New hypotheses
Case #4: Sports

  User:   Daniel Lertpratchya
          Manchester United soccer fan
          www.manutd.com


  Data:   61 soccer matches

  Task:   Find interesting matches to watch replay videos.

          Explore data to find fun facts.



Begin     Score          Opponent Score             End
Find interesting matches

Begin
Score
Opponent Score
End
Demolish another team.
Came back after conceded two goals.
Performance: home vs. away

Begin
Score
Opponent Score
Missed Penalty
End
Finding specific situations.

Begin
Score
Opponent Score
Missed Penalty
End
4
                                                                     #
               Design Guidelines
Align-Rank-Filter   Handle event types      Incorporate attributes

                     Breakfast
                     Lunch       }   Meal




 Multiple levels    Multiple overviews       Coordinated views
 of information
   Overview
    Record
    Event



     Search         Data preprocessing      History / Provenance
Outline
                      Approximate
Introduction            Search           Conclusions

               LifeFlow        Case Studies
               Overview
Contributions
1.  How to provide an overview of multiple event sequences?
                                                                    #   1
                     LifeFlow Visualization
               Aggregation, Visual encodings & Interactions


2.  How to support users when they are uncertain about
    what they are looking for?
   #2                                                               #   3
         Similarity Search                 Hybrid Search
        Similan + Match & Mismatch       Flexible Temporal Search



                                                                    4
                                                                    #
             Case Studies + Design Guidelines
Future Directions
Outflow

        Improve the              New tasks:
     visualization & UI:        comparison,
       colors, gaps, …     attributes in query, …!



     More complex data:        Scalability:
      stream, interval          database,
      concurrency, …!      cloud computing, …
Outline
                      Approximate
Introduction            Search            Conclusions

               LifeFlow        Case Studies
               Overview
Outline
                      Approximate
Introduction            Search              Conclusions

               LifeFlow        Case Studies
               Overview


               This is an event sequence!
refresh
fruitful
Acknowledgement
               Washington Hospital Center
   Dr. A. Zach Hettinger , Dr. Phuong Ho and Dr. Mark Smith

                National Institutes of Health
                    Grant RC1CA147489-02


Center for Integrated Transportation Systems Management
  a Tier 1 Transportation Center at the University of Maryland


                     Study Participants

         Advisors, Committees, HCIL Colleagues
Contributions
       1.  How to provide an overview of multiple event sequences?

                           LifeFlow Visualization
                     Aggregation, Visual encodings & Interactions


       2.  How to support users when they are uncertain about
           what they are looking for?


                Similarity Search                Hybrid Search
              Similan + Match & Mismatch       Flexible Temporal Search




                   Case Studies + Design Guidelines


http://www.cs.umd.edu/hcil/lifeflow                kristw@cs.umd.edu / @kristwongz
Thank you
 ขอบคุณครับ

More Related Content

More from Krist Wongsuphasawat

What I tell myself before visualizing
What I tell myself before visualizingWhat I tell myself before visualizing
What I tell myself before visualizingKrist Wongsuphasawat
 
Navigating the Wide World of Data Visualization Libraries
Navigating the Wide World of Data Visualization LibrariesNavigating the Wide World of Data Visualization Libraries
Navigating the Wide World of Data Visualization LibrariesKrist Wongsuphasawat
 
Encodable: Configurable Grammar for Visualization Components
Encodable: Configurable Grammar for Visualization ComponentsEncodable: Configurable Grammar for Visualization Components
Encodable: Configurable Grammar for Visualization ComponentsKrist Wongsuphasawat
 
6 things to expect when you are visualizing (2020 Edition)
6 things to expect when you are visualizing (2020 Edition)6 things to expect when you are visualizing (2020 Edition)
6 things to expect when you are visualizing (2020 Edition)Krist Wongsuphasawat
 
Increasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchIncreasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchKrist Wongsuphasawat
 
What to expect when you are visualizing (v.2)
What to expect when you are visualizing (v.2)What to expect when you are visualizing (v.2)
What to expect when you are visualizing (v.2)Krist Wongsuphasawat
 
ร้อยเรื่องราวจากข้อมูล / Storytelling with Data
ร้อยเรื่องราวจากข้อมูล / Storytelling with Dataร้อยเรื่องราวจากข้อมูล / Storytelling with Data
ร้อยเรื่องราวจากข้อมูล / Storytelling with DataKrist Wongsuphasawat
 
Reveal the talking points of every episode of Game of Thrones from fans' conv...
Reveal the talking points of every episode of Game of Thrones from fans' conv...Reveal the talking points of every episode of Game of Thrones from fans' conv...
Reveal the talking points of every episode of Game of Thrones from fans' conv...Krist Wongsuphasawat
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizingKrist Wongsuphasawat
 
Adventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterAdventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterKrist Wongsuphasawat
 
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsData Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsKrist Wongsuphasawat
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsKrist Wongsuphasawat
 
From Data to Visualization, what happens in between?
From Data to Visualization, what happens in between?From Data to Visualization, what happens in between?
From Data to Visualization, what happens in between?Krist Wongsuphasawat
 
A Narrative Display for Sports Tournament Recap
A Narrative Display for Sports Tournament RecapA Narrative Display for Sports Tournament Recap
A Narrative Display for Sports Tournament RecapKrist Wongsuphasawat
 
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...Krist Wongsuphasawat
 
Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
Outflow: Exploring Flow, Factors and Outcome of Temporal Event SequencesOutflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
Outflow: Exploring Flow, Factors and Outcome of Temporal Event SequencesKrist Wongsuphasawat
 

More from Krist Wongsuphasawat (20)

What I tell myself before visualizing
What I tell myself before visualizingWhat I tell myself before visualizing
What I tell myself before visualizing
 
Navigating the Wide World of Data Visualization Libraries
Navigating the Wide World of Data Visualization LibrariesNavigating the Wide World of Data Visualization Libraries
Navigating the Wide World of Data Visualization Libraries
 
Encodable: Configurable Grammar for Visualization Components
Encodable: Configurable Grammar for Visualization ComponentsEncodable: Configurable Grammar for Visualization Components
Encodable: Configurable Grammar for Visualization Components
 
6 things to expect when you are visualizing (2020 Edition)
6 things to expect when you are visualizing (2020 Edition)6 things to expect when you are visualizing (2020 Edition)
6 things to expect when you are visualizing (2020 Edition)
 
Increasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchIncreasing the Impact of Visualization Research
Increasing the Impact of Visualization Research
 
What to expect when you are visualizing (v.2)
What to expect when you are visualizing (v.2)What to expect when you are visualizing (v.2)
What to expect when you are visualizing (v.2)
 
ร้อยเรื่องราวจากข้อมูล / Storytelling with Data
ร้อยเรื่องราวจากข้อมูล / Storytelling with Dataร้อยเรื่องราวจากข้อมูล / Storytelling with Data
ร้อยเรื่องราวจากข้อมูล / Storytelling with Data
 
Reveal the talking points of every episode of Game of Thrones from fans' conv...
Reveal the talking points of every episode of Game of Thrones from fans' conv...Reveal the talking points of every episode of Game of Thrones from fans' conv...
Reveal the talking points of every episode of Game of Thrones from fans' conv...
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizing
 
Adventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterAdventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at Twitter
 
Logs & Visualizations at Twitter
Logs & Visualizations at TwitterLogs & Visualizations at Twitter
Logs & Visualizations at Twitter
 
d3Kit
d3Kitd3Kit
d3Kit
 
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsData Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
 
Data Visualization at Twitter
Data Visualization at TwitterData Visualization at Twitter
Data Visualization at Twitter
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
 
From Data to Visualization, what happens in between?
From Data to Visualization, what happens in between?From Data to Visualization, what happens in between?
From Data to Visualization, what happens in between?
 
A Narrative Display for Sports Tournament Recap
A Narrative Display for Sports Tournament RecapA Narrative Display for Sports Tournament Recap
A Narrative Display for Sports Tournament Recap
 
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
 
Usability of Google Docs
Usability of Google DocsUsability of Google Docs
Usability of Google Docs
 
Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
Outflow: Exploring Flow, Factors and Outcome of Temporal Event SequencesOutflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

Here are the top 3 most similar records to your query:Record 1 (Similarity score: 0.83)Record 2 (Similarity score: 0.75) Record 3 (Similarity score: 0.70)Would you like me to show you the details? Or search for other similar records?User: Show details of Record 1 please

  • 1.
  • 2. event event event event event event event event LIFE event event event event event event event event
  • 3. Time Event type ( 7:00 am, Wake up ) event event event event event event event event LIFE event event event event event event event event
  • 4. event event event event event event event event LIFE event event event event event event event event “Event Sequence”
  • 5. Daily Activities 7:00/W!"# $p 7:15/S%&w#r 8:00/Br#!"f!'(
  • 6. Student Progress A$)’07/E*(#r M!+’09/M!'(#r Apr’12/D#f#*'#
  • 7. Event Sequences Medical Transportation Sports Education Web logs Logistics and more…
  • 9. 1. Lack of overview Show overview or summary 60,041 patients 203,214 traffic incidents Where should I start? Is the dataset cleaned? 7,022 web sessions … and more
  • 10. 2. Approximate search ICU Floor ICU QUERY within 2 days Find something useful and display. RESULTS Frustrated! Found 0 record
  • 11. Research Questions Overview Search How to provide an overview How to support users of multiple event sequences? when they are uncertain about what they are looking for? LifeFlow Similan Flexible Temporal Search
  • 12. Outline Approximate Introduction Search Conclusions LifeFlow Case Studies Overview How to provide an overview of multiple event sequences?
  • 13. From one event sequence... •  Single record [Cousins91], [Harrison94], [Plaisant98], … Patient ID: 45851737 12/02/2008&14:26 &Arrival& 12/02/2008&14:26 &Emergency& 12/02/2008&22:44 &ICU& 12/05/2008&05:07 &Floor& 12/08/2008&10:02 &Floor& 12/14/2008&06:19 &Discharge& & Time Patient #45851737 Arrival Emergency Room ICU Floor Discharge compact
  • 14. To multiple event sequences... •  Search [Fails06], [Wang08], [Vrotsou09], …
  • 15. To multiple event sequences... •  Search [Fails06], [Wang08], [Vrotsou09], … •  Group [Phan07], [Burch08], [Wang09], … 1 { 2 {
  • 17. Summarize e.g. 1) What happened to the patients after they arrived? Arrival! ? ? 2) What happened to the patients before & after ICU? ICU! ? ? ? ?
  • 18. Overview / Summary Millions of records!
  • 19. Challenges Squeeze into one screen AGGREGATE Screen Millions of records Preserve information!
  • 20. 1 # LifeFlow scalable & novel overview summarizes all possible sequences! & gaps between events!
  • 22. 1 # time #1& Event Sequences #2& n records #3& …& 1,000,000 Aggregate O(n) Tree of Sequences α" No. of patterns 9 nodes Represent time records LifeFlow Visual Representation Space-filling technique Average time Event Bar End Node
  • 24. User Study xxxxx 12-minute yyyyy 10 participants training 15 tasks Participants could perform the tasks accurately and rapidly.
  • 25. Quotes “ Oh! This is very cool! ” “ Theunderstand easy to tool is “ LifeFlow provides a great summary and easy to use.! ” of the big picture.! ” “ find common Very easy to “ Can I use it and uncommon sequences! with my dataset? ” ”
  • 26. wait for the case studies :)
  • 27. Outline How to support users when they are uncertain about what they are looking for? Approximate Introduction Search Conclusions LifeFlow Case Studies Overview Similarity Search Hybrid Search
  • 28. Related Work: Exact Match Exact Match •  Event Sequence MUST have A, B, C –  TimeSearcher [Hochheiser04] Query –  PatternFinder [Fails06] –  LifeLines2 Record#1 [Wang08] –  ActiviTree Record#2 [Vrotsou09] –  QueryMarvel Record#3 [Jin09]
  • 29. Related Work: Similarity Search •  Image Similarity Search [Kato92] SHOULD have A, B, C •  Stock Price [Wattenberg01] Query more" similar! •  Web page [Watai07] Record#2 0.91 •  Bank account [Chang07] Record#1 0.83 •  Event Sequence? Record#3 0.70
  • 30. Challenges What is similar? depends on users/tasks Query Record #1 A! B! C! Record #2 missing A! B! C! Record #3 extra A! B! C! D! Record #4 A! B! time difference C! Record #5 swap A! C! B!
  • 31. Match & Mismatch (M&M) Measure Time Query Record #1 A! C! B! D! Record #2 A! B! C! E! Matched events Missing Extra } Time difference Number of swap Total Score Number of missing events 0.00-1.00 Number of extra events
  • 32. 2 # Similarity Search Similarity Measure Match & Mismatch + User Interface Similan What is similar?! Specify query / Display results! Version 1 xxxxyyyy Version 2
  • 34. Controlled Experiment Exact Match Similarity Search LifeLines2 Similan xxxxxxxxx xxxxyyyyy 18 participants
  • 35. Lessons Exact Match Similarity Search Counting Similar Confidence Flexible Uncertainty accept reject
  • 36. Combination Exact Match + Similarity Search = Hybrid accept reject accept reject
  • 37. 3 # Flexible Temporal Search (FTS) “mandatory” Results Begin Query Constraint #1 PASS FAIL Constraint #2 accept Constraint #3 mandatory reject optional Reject
  • 38. 3 # Flexible Temporal Search (FTS) “optional” Results Query Constraint #1 PASS FAIL Constraint #2 accept Constraint #3 mandatory reject optional
  • 39. mandatory Constraints •  Event A! B! C! Aug 14, 2000 •  Timing A! •  Negation A! C! B! •  Gap A! 1-2 days! C!
  • 40. optional Constraints •  Event A! B! C! Aug 14, 2000 •  Timing A! •  Negation A! C! B! •  Gap A! 1-2 days! C!
  • 41. FTS Matching Time Query A! B! C! D! E! Record #2 A! B! D! C!
  • 42. FTS Matching(2) i Query A! B! C! D! E! s(0,0) s(1,0) Record #2 s(0,1) A! Dynamic programming { B! j s(i-1, j) + skip( query[i] ) D! s(i, j) = max s(i, j-1) + skip( events[j] ) s(i-1, j-1) + match( query[i], events[j] ) C!
  • 43. Similarity Vector s(i,j) •  No. of matched events (mandatory) •  No. of matched events (optional) •  No. of negations violated (optional) •  No. of negations violated (mandatory) •  No. of time constraints violated •  Time difference •  No. of extra events –  Extra before the first match –  Extra between the first and last match –  Extra after the last match
  • 44. (Flexible Temporal Search) Query FTS Record#1 Grade Similarity Score Pass/Fail 0-100 1.  Missing events 2.  Extra events 3.  Negation violations 4.  Time difference
  • 46. Outline Approximate Introduction Search Conclusions LifeFlow Case Studies Overview Multi-dimensional In-depth Long-term Case Studies (MILCs)
  • 48. MILCs # Domain Data Size Duration 1 Medical 7,041 7 months 2 Transportation 203,214 3 months 3 Medical 20,000 6 months 4 Medical 60,041 1 year 5 Web logs 7,022 6 weeks 6 Activity logs 60 5 months 7 Logistics 821 6 weeks 8 Sports 61 5 weeks 8 case studies / 6 domains
  • 49. Case #1: Medical User: Dr. A. Zach Hettinger MedStar Institute for Innovation mi2.org Data: 60,041 patients Task: Hospital readmissions
  • 50. Current Report Patient Diagnosis Visit Date Physician Visit Date Physician #1 #1 #2 #2 Mr. X Back pain Jun 10, 2010 Dr. Jones Jun 29, 2010 Dr. Brown Mr. Y Chest pain Jun 11, 2010 Dr. Jones Jun 20, 2010 Dr. Jones … … … … … … An example of current report used in a hospital (fake data) How many patients came back? Did they come back for the 3rd, 4th, … time? How many came back and died? …
  • 51. 60,041 patients How many patients came back? Did they come back for the 3rd, 4th, … time? Registration
  • 52. 60,041 patients Registration How many came back and died? Death
  • 53. 60,041 patients Location Registration Admission Death
  • 54. 60,041 patients Find a pattern: Registration > Discharge > Registration > Death Registration Discharge Death
  • 55. 60,041 patients Find a pattern: Registration > Discharge > Registration > Death Registration Discharge Death
  • 56. Analyzing data in a new way Personal exploration Long-term monitoring Save more lives!
  • 57. Case #2: Transportation User: CATT Lab at the University of Maryland www.cattlab.umd.edu Data: 203,214 traffic incidents Task: Comparing traffic agencies’ performance
  • 60. Video
  • 61.
  • 62.
  • 64.
  • 65. Detect anomalies Clean data Large dataset
  • 66. Case #3: Web logs User: Anne Rose International Children’s Digital Library www.childrenslibrary.org Data: 7,022 sessions Task: How do people read children books online? PAGE 1 PAGE 2 PAGE 3 …
  • 67.
  • 68.
  • 72. Case #4: Sports User: Daniel Lertpratchya Manchester United soccer fan www.manutd.com Data: 61 soccer matches Task: Find interesting matches to watch replay videos. Explore data to find fun facts. Begin Score Opponent Score End
  • 75. Came back after conceded two goals.
  • 76. Performance: home vs. away Begin Score Opponent Score Missed Penalty End
  • 78. 4 # Design Guidelines Align-Rank-Filter Handle event types Incorporate attributes Breakfast Lunch } Meal Multiple levels Multiple overviews Coordinated views of information Overview Record Event Search Data preprocessing History / Provenance
  • 79. Outline Approximate Introduction Search Conclusions LifeFlow Case Studies Overview
  • 80. Contributions 1.  How to provide an overview of multiple event sequences? # 1 LifeFlow Visualization Aggregation, Visual encodings & Interactions 2.  How to support users when they are uncertain about what they are looking for? #2 # 3 Similarity Search Hybrid Search Similan + Match & Mismatch Flexible Temporal Search 4 # Case Studies + Design Guidelines
  • 81. Future Directions Outflow Improve the New tasks: visualization & UI: comparison, colors, gaps, … attributes in query, …! More complex data: Scalability: stream, interval database, concurrency, …! cloud computing, …
  • 82. Outline Approximate Introduction Search Conclusions LifeFlow Case Studies Overview
  • 83. Outline Approximate Introduction Search Conclusions LifeFlow Case Studies Overview This is an event sequence!
  • 85.
  • 86.
  • 87.
  • 89. Acknowledgement Washington Hospital Center Dr. A. Zach Hettinger , Dr. Phuong Ho and Dr. Mark Smith National Institutes of Health Grant RC1CA147489-02 Center for Integrated Transportation Systems Management a Tier 1 Transportation Center at the University of Maryland Study Participants Advisors, Committees, HCIL Colleagues
  • 90. Contributions 1.  How to provide an overview of multiple event sequences? LifeFlow Visualization Aggregation, Visual encodings & Interactions 2.  How to support users when they are uncertain about what they are looking for? Similarity Search Hybrid Search Similan + Match & Mismatch Flexible Temporal Search Case Studies + Design Guidelines http://www.cs.umd.edu/hcil/lifeflow kristw@cs.umd.edu / @kristwongz