My Research Defense

2,464
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,464
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide














































  • My Research Defense

    1. 1. Mental Workload in Multi-Device Personal Information Management Manas Tungare Advisory Committee: Dr. Manuel Pérez-Quiñones Dr. Stephen H. Edwards Dr. Edward A. Fox Prof. Steve Harrison Dr. Tonya Smith-Jackson Thursday, February 12, 2009
    2. 2. Talk outline 0 ~45 min 90 min Presentation & questions Additional comments, suggestions OK to record audio? Your questions/comments are welcome at any time. Thursday, February 12, 2009
    3. 3. Problem statement & Research questions Thursday, February 12, 2009
    4. 4. Personal information, Multiple devices Thursday, February 12, 2009
    5. 5. State of the art • Difficult to maintain files on 2+ machines • Workaround: USB drives, email-to-self • Multiple paper calendars are difficult to read • Workaround: Online calendars • Hard to enter phone numbers on phone • Workaround: Sticky notes Thursday, February 12, 2009
    6. 6. General Hypothesis • PIM strategies may result in high workload • leading to increased perception of task difficulty • Alternate strategies may lead to lower workload Thursday, February 12, 2009
    7. 7. Mental workload issues • What is the mental workload incurred by users when they are trying to use multiple devices for personal information management? • For those tasks that users have indicated are frustrating for them, do the alternate strategies result in lower mental workload? • Are multi-dimensional subjective workload assessment techniques (such as NASA TLX) an accurate indicator of operator performance in information ecosystems? Thursday, February 12, 2009
    8. 8. Mental workload • [...] “That portion of an operator’s limited capacity actually required to perform a particular task.” [O’Donnell and Eggemeier, 1986] • Low to moderate levels of workload are associated with acceptable levels of operator performance [Wilson and Eggemeier, 2006] • Measured using subjective measures or physiological measures Thursday, February 12, 2009
    9. 9. Research Question 1 • RQ: Do alternate strategies impose different levels of mental workload? • Hypothesis: Alternate strategies lead to lower mental workload than the standard strategies • Experiment: Compare mental workload for tasks identified as difficult, and for their respective workarounds Thursday, February 12, 2009
    10. 10. Research Question 2 • RQ: Are subjective assessments of mental workload an accurate indicator of operator performance in this domain? • Hypothesis: Mental workload measured by NASA TLX can be used to predict operator performance • Experiment: (Attempt to) correlate workload assessments with operator performance Thursday, February 12, 2009
    11. 11. Research Question 3 • RQ: Are both, subjective measures of workload (TLX) and physiological measure (pupil diameter), sensitive to PIM tasks? • What can we learn from changes in pupil diameter in relation to sub-task boundaries? Thursday, February 12, 2009
    12. 12. Experiment design Thursday, February 12, 2009
    13. 13. Survey • N ⊂ 220 • Responses to free-form questions in survey • 5 tag types defined a priori: • Devices, tasks, problems, solutions, results • Tags based on emergent codes • Device=laptop, desktop • Problem=syncFailed, conflictingEdits Thursday, February 12, 2009
    14. 14. Experiment design • Within subjects (repeated measures) in two sessions 2 weeks apart to minimize learning effects) • Complete block design • Two-factor (task, level of system support) • 6 treatments: 3 tasks ⨉ 2 levels of system support • Counterbalanced to minimize order effects Thursday, February 12, 2009
    15. 15. Overview Files Calendar Contacts Participant Code: Date: Treatment: Session: W T 2009S January 5 to January 11, 2009 January 2009 February M TW T F S S MT F S Home Calendar 1 2 3 4 1 5 6 7 8 9 10 11 2 3 4 5 6 7 8 Week 1 12 13 14 15 16 17 18 9 10 11 12 13 14 15 January 2009 19 20 21 22 23 24 25 16 17 18 19 20 21 22 26 27 28 29 30 31 23 24 25 26 27 28 PIM Study - Home Monday 5 Tuesday 6 Wednesday 7 Thursday 8 Friday 9 Saturday 10 Sunday 11 8 AM 9 AM 10 AM Participant Code: Date: Treatment: Session: W T 2009S January 5 to January 11, 2009 January 2009 February 11 AM M TW T F S S MT F S Home Calendar 1 2 3 4 1 5 6 7 8 9 10 11 2 3 4 5 6 7 8 Week 1 NOON 12 13 14 15 16 17 18 9 10 11 12 13 14 15 Level 0 January 2009 19 20 21 22 23 24 25 16 17 18 19 20 21 22 26 27 28 29 30 31 23 24 25 26 27 28 1 PM PIM Study - Home 2 PM Team Outing Monday 5 Tuesday 6 Wednesday 7 Thursday 8 Friday 9 Saturday 10 Sunday 11 3 PM 8 AM 4 PM Dentist's appoint! 9 AM ment 5 PM 10 AM 6 PM Michael's Little League game (tenta! 11 AM tive; confirm with 7 PM Alex) NOON 8 PM 1 PM 9 PM 2 PM Team Outing Page 1/1 3 PM 4 PM Dentist's appoint! ment 5 PM 6 PM Michael's Little League game (tenta! tive; confirm with 7 PM Alex) 8 PM 9 PM Page 1/1 Multiple paper No support for No support for calendars synchronization file migration Level 1 System supports Devices support Online calendars file migration synchronization Thursday, February 12, 2009
    16. 16. Sample size estimation • After first 8 participants Task Cohen’s d Sample size estimate Files d = 0.671 n = 9.778 Calendar d = 0.528 n = 15.098 Contacts d = 0.536 n = 14.672 All tasks d = 0.602 n = 11.861 • Effect sizes = Medium to High for Overall Workload • Goal for sample size is 20 Thursday, February 12, 2009
    17. 17. Participants • Knowledge workers recruited via email, flyers, personal contacts and promises of pizza • Experienced in laptop & phone use • N=11 18-21 6 Male, 22-25 26-30 5 Female 31-35 0 1 2 3 4 Thursday, February 12, 2009
    18. 18. Eye tracker Desktop Instructions Display Eye tracker recorder Laptop Photo credit: Ramanujam Parthasarathy Thursday, February 12, 2009
    19. 19. Task familiarization • 6 videos were made, related to tasks • each between 2–6 minutes long • 10 familiarization tasks required to be performed before experimental tasks • Watch videos Thursday, February 12, 2009
    20. 20. Files task • Start on desktop • Set of instructions to edit specific files • Then move to laptop, edit more files • Move back to desktop • L0: using USB drives, email-to-self • L1: using a Network drive Thursday, February 12, 2009
    21. 21. Task instructions Thursday, February 12, 2009
    22. 22. Calendar task • Set of instructions to create, replace, update, delete calendar entries • “Today is …” • Questions on availability and schedule • L0: Paper calendars, home and work • L1: Online calendars, home and work Thursday, February 12, 2009
    23. 23. Task instructions Thursday, February 12, 2009
    24. 24. Contacts task • Set of instructions to create, replace, update, delete contact records • “You may/may not use your phone/laptop” • L0: phone + laptop, no sync support • L1: phone + laptop, with sync support Thursday, February 12, 2009
    25. 25. Task instructions Thursday, February 12, 2009
    26. 26. Measures • Time on task • captured by app that displays instructions • Task performance metrics (vary by task) • NASA TLX • Pupillometric data from eye tracker Thursday, February 12, 2009
    27. 27. Why NASA TLX • Higher correlation with performance (concurrent validity) as compared to SWAT and WP [Rubio & Díaz, 2004] • Validated in several environments since 1988 [several, 1988-present] • Sensitive to some differences not discriminated by SWAT [Battiste 1988] • Highest sensitivity among 4 scales [Hill 1989] Thursday, February 12, 2009
    28. 28. Pupillometric data • Pupil diameter can be used as an estimate of mental workload [Beatty 1982] • Task-Evoked Pupillary Response (TERP) • Physiological measure (not subjective) • Continuous measure (unlike TLX) • Post-processing is required Thursday, February 12, 2009
    29. 29. Analysis Thursday, February 12, 2009
    30. 30. RQ1: Workload at L0 & L1 • Effort is significantly lower at α=0.05 for L1 than for L0 (ANOVA) for N=8 Mean L0 Mean L1 p value MD: Mental Demand 48.9 40 0.1878 PD: Physical Demand 35.3 33 0.7271 TD: Temporal Demand 40.3 30.5 0.1197 OP: Own Performance 27.6 17.8 0.0604 ✓ EF: Effort 51.1 35.5 0.0382 FR: Frustration 38.2 25.8 0.0564 OW: Overall Workload 41.4 31.4 0.0666 Thursday, February 12, 2009
    31. 31. Time on task L0 L1 p value Mean (SD) Mean (SD) Files 2663 (802) 2309 (601) 0.394 Calendar 2754 (1677) 1786 (1077) 0.226 Contacts 2558 (1368) 1832 (1478) 0.377 Thursday, February 12, 2009
    32. 32. RQ2: Performance predictor • TLX OW not found to correlate highly with time on task • Pearson’s r: Workload ~ Time on Task • r = 0.188 for Files • r = –0.014 for Calendars • r = 0.031 for Contacts Thursday, February 12, 2009
    33. 33. RQ2: Performance predictor Pearson’s r Files Calendar Contacts MD: Mental Demand 0.271 –0.171 0.087 PD: Physical Demand 0.140 0.190 –0.226 TD: Temporal Demand 0.095 0.074 –0.254 OP: Own Performance 0.288 0.036 –0.086 EF: Effort 0.196 0.016 0.227 FR: Frustration 0.393 0.135 0.083 OW: Overall Workload 0.188 0.014 –0.031 • Further analysis at step-level, not task-level Thursday, February 12, 2009
    34. 34. Time on task: Files 400 L0 Move from Desktop Move from Laptop to Laptop to Desktop L1 p = 0.1624 p = 0.1577 300 Time Taken (s) 200 ! ! 100 ! ! ! ! ! ! ! ! 0 1 2 3 4 5 6 7 8 9 10 Step # Thursday, February 12, 2009
    35. 35. Time on task: Calendars 100 L0 ! Data lookup steps L1 80 ! ! ! ! 60 Time Taken (s) ! 40 ! ! 20 ! ! ! Data entry steps ! ! ! ! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Step # Thursday, February 12, 2009
    36. 36. Time on task: Contacts 200 L0 L1 150 ! Time Taken (s) ! 100 ! 50 ! ! ! 0 1 2 3 4 5 6 Step # Thursday, February 12, 2009
    37. 37. RQ3: TLX & Pupillometric • Analyzing pupillometric data • 100,000 data points per session @ 30 Hz • Need to filter blinks • Establish baseline; compute relative changes • Signal smoothing techniques Thursday, February 12, 2009
    38. 38. Initial results from pupillometric data 120 100 Pupil Radius (eye image pixels) 80 60 40 S0 S1 S3 S2 S4 S5 S6 S7 S8 S9 S10 20 0 200 400 600 Thursday, February 12, 2009 Time Elapsed (seconds)
    39. 39. Initial results from pupillometric data 120 100 Pupil Radius (eye image pixels) 80 60 40 S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 20 0 200 400 600 800 1000 Thursday, February 12, 2009 Time Elapsed (seconds)
    40. 40. Expected outcomes (“So what?”) Thursday, February 12, 2009
    41. 41. Alternate strategies • Lower mental workload ✓ • Lower time on task ✓ • Synced phones have more data entered ✓ • (Slightly) fewer errors Thursday, February 12, 2009
    42. 42. Observations • Online calendars provided a frame of reference at all times (highlighted day) • A few chose not to sync calendars (in L1) • None prepared for the transition until asked to switch machines • The step after “move now” takes a lot of time — participants don’t realize they’re missing information until they need it Thursday, February 12, 2009
    43. 43. Identify critical sub-tasks • Files: Time-on-task was a highly discriminative measure for the sub-task of moving from one machine to another • Pupillometric measure appears sensitive to changes in workload across sub-tasks • Calendar task: paper was faster for data entry, online was faster for lookup ★ Optimize selectively, remove bottlenecks Thursday, February 12, 2009
    44. 44. Cross-task measure • TLX can be used to study PIM tasks • E.g. which of browsing or searching leads to higher workload? • E.g. does Tool A lead to lower workload than Tool B? Thursday, February 12, 2009
    45. 45. Studying multiple devices • Study each one individually? • What happens at the transition … • ... Thursday, February 12, 2009
    46. 46. Questions & comments ? ! Note to self: Turn off audio recording before committee deliberation. Thursday, February 12, 2009
    47. 47. Questions & comments ? ! Thank you! Note to self: Turn off audio recording before committee deliberation. Thursday, February 12, 2009

    ×