Lecture 7: Evaluating AR      Applications       pp        Mark Billinghurst                   g          HIT Lab NZ    Un...
Building Compelling AR ExperiencesB ildi   C    lli      E    i          experiences                         Evaluation   ...
Introduction
The Interaction Design Process
The Interaction Design Process
Why Evaluate AR Applications?To test and compare interfaces, new technologies,interaction techniquesTest Usability (learna...
Survey of AR PapersEdward Swan (2005)Surveyed major conference/journals (1992-2004)   - P     Presence, ISMAR, ISWC, IEEE ...
AR Papers
HIT Lab NZ Usability SurveyA Survey of Evaluation Techniques Used inAugmented Reality Studies   Andreas Dünser, Raphaël Gr...
450400350300250200150100 50  0      1992   1993     1994         1995   1996    1997     1998      1999   2000    2001    ...
Types of User StudiesTypes of AR user studies  Perception       p  User Performance  Collaboration  Usability of Complete ...
Types of AR User Studies
Types of Experimental Measures Used Types of Experimental Measures   Objective measures   Subjective measures   Qualitativ...
Types of Experimental Measures Used
SummaryOver last 10 years  Most user studies focused on user performance                                      p  Fewest us...
Types of User Evaluation
What is evaluation?Evaluation is concerned with gatheringdata about the usability of a design orproduct by a specified gro...
EvaluationGoal: Measure goodness of the application design      ypTwo types:  Formative evaluation performed at different ...
When to evaluate?Once the application has been developed  pros : rapid development, small evaluation cost  cons : rectifyi...
Four evaluation paradigms‘quick and dirty’ q              yusability testing (lab studies)field studiespredictive evaluation
Quick and dirty‘quick & dirty’ evaluation: informal feedback fromusers or consultants to confirm that their ideas are in- ...
Usability TestingRecording typical users’ performance on typical tasks incontrolled settings. Field observations may be us...
Laboratory-based StudiesLaboratory-based studies  can be used for evaluating the design, or the                           ...
Laboratory-based StudiesControlled, instrumented environment
Field StudiesField studies are done in natural settingsThe aim is to understand what users do naturally and               ...
Predictive EvaluationExperts apply their knowledge of typical users,guided by heuristics, to predict usability problems.Ca...
Characteristics of Approaches            Usability      Field studies   Predictive            testingUsersU           do k...
Evaluation Approaches and MethodsMethod         Usability   Field studies Predictive               testingObservingOb    i...
DECIDE:         A framework to guide evaluation-   Determine the goals the evaluation addresses.-   Explore the specific q...
DECIDE FrameworkDetermine Goals:D         G l   What are the high-level goals of the evaluation?   How wants the evaluatio...
DECIDE FrameworkDecide on Ethical Issues   Informed consent form   Participants have a right to:   -kknow th goals of th s...
Usability Testing
Pilot StudiesA small trial run of the main study.    Can identify majority of issues with interface designPilot studies ch...
Controlled experimentsDesigner of a controlled experiment should carefullyconsider  proposed hypothesis  selected subjects...
Variables                             V i blExperiments manipulate and measure variables undercontrolled conditionsThere a...
“Other” VariablesControl variables  e.g. room light, noise…  if controlled => less external validityRandom variables (not ...
HypothesisA hypothesis is a prediction of the outcome  what will happen to the dependent variables when the  independent v...
Experimental methodsIt is important to select the right experimentalmethod so that the results of the experimentcan be gen...
Experimental methodsBetween-groups        g   p                                          Within-groups                    ...
Within vs. Between Subjectsbetween subjects design  each participant is tested on only one level/condition  a separate gro...
Between SubjectsSometimes a factor must be between subjects  e.g. gender, age, experienceBetween subjects advantage:  avoi...
Within SubjectsSometimes a factor must be within subjects  e.g. measuring learning effectsWithin subjects advantages  less...
Latin Square Designeach condition occurs once in each row and columnNote: In a balanced Latin Square each condition bothpr...
SubjectsThe hTh choice of subjects is critical to the validity of the              f b               l     h    ld      f ...
SubjectsHow many participants?H                    ?  How big is the effect you want to measure?   - l     large effects c...
Data Collection and AnalysisThe choice of a method is dependent on the type ofdata hd that needs to be collected          ...
Observe and MeasureObservations are gathered…  manually (human observers)  automatically (computers, software, cameras, se...
Typical objective metricstask completion time   k       l i    ierrors (number, percent,…)percent of task completedratio o...
Typical subjective metricsuser satisfactionsubjective p    j       performanceratingsease of useintuitivenessjudgments…
Data TypesSubjective   Subjective survey                     How easy was the task    - Likert Scale, condition rankings  ...
Experimental Measures                     E erimental Meas res         Measure                      What does it tell us? ...
Statistical AnalysisOnce data is collected statistics can be used for analysisTypical Statistical Techniques yp           ...
Running the studyOffloadOffl d your B            Brain!                 !  Write down instructions  prepare checklists    ...
Running the studyTreat the participants nicelyPrepare candy and drinks and make them feel good.   p         y             ...
Running the studyTake many photos of your setup in action.Prepare consent forms if y want to use pictures    p            ...
Field Studies
Field S d                   F ld Studies     Field studies are done in natural settings                                   ...
ObservationDirect observation i the fi ldDi      b      i in h field   Structuring frameworks   Degree of participation (i...
Ethnography• Ethnography is a philosophy with a set of techniques that  include participant observation and interviews• Et...
Direct observation in a controlled setting                                         g    Think-aloud techniqueIndirect obse...
Structuring frameworks to guide observation  - The person. Who?  - The place. Where?        p  - The thing. What?   The Go...
Predictive Evaluation
Predictive ModelsProvide a way of evaluating products or designswithout directly involving users.Less expensive than user ...
Fitts’ Law (Fitts, 1954)Fitts’ Law predicts that the time to point at an objectusing a device is a function of the distanc...
GOMS ModelGoals hG l - the state the user wants to achieve e.g., find a                   h                 hi         fi ...
GOMS Response Times (Card et al., 1983)   Operator   Description                                      Time (sec)   K      ...
Expert InspectionsSeveral kindsExperts use their knowledge of users and technology toreview application usability.Expert c...
Nielsen’s heuristicsVisibility of system status                       status.Match between system and real world.User cont...
Three Stages for Doing Heuristic Evaluation   1/ Briefing session to tell experts what to do.   2/ Evaluation period of 1 ...
No. of evaluators & problems
Advantages and ProblemsFew ethical and practical issues to consider because usersnot involved.Can be difficult and expensi...
Case Studies
Types of AR ExperimentsPerception   How is virtual content perceived ?                          p   What perceptual cues a...
PerceptionCentral goal of AR systems is to fool the human perceptual  system  Display Modes  Di l M d     Direct View     ...
Perceptual User StudiesDepth / Distance Studies   Estimate distance to object   Judge relative proximityObject localizatio...
Layar – www.layar.com
Outdoor AR: Limited Field of View
Possible l iP ibl solutions  Overview + Detail   spatial separation; two views  Focus + Context   merges both views into o...
Zooming ViZ   i ViewsTU G   Graz – HIT Lab NZ - collaboration              L b        ll b ti Zooming panorama Zooming M Z...
Zooming AR interfaces           Z   i      i    fContext CompassContext Compass           Zooming Panorama                ...
Experiment Evaluation20 subjects (10 M/ 10 F)Café finding task           g  Task 1: Find particular café named “Alpha”  Ta...
Performance Time
Distance Panned
ResultsCompass good for search, but not comparisonZooming (P or M) aids comparison        g(       )           pInformatio...
Interaction StudiesStages of Interface Development•   Prototype Demonstration•   Adoption of Interaction techniques from o...
Fitt’s Law (1964)Relates Movement Time to Index of Difficulty      MT = a + b log2(2A/W)             where log2(2A/W) = ID...
Interaction Study - ReachingMason, A. et. al. (2001). Reaching Movements to Augmented and GraphicObjects in Virtual Enviro...
Experimental SetupEnhanced Virtual Hand LabHalf Silvered MirrorShutter GlassesOPTOTRAK optical tracker               p  IR...
Kinematic MeasuresMovement TimePeak Velocity of Wrist            yTime to Peak Velocity of the WristPercent Time from Peak...
Results – Movement Time
Results – Velocity Profiles
AR NavigationMany commercial AR browsers  Information in place  How to navigate to POI
2D vs. AR Navigation?        VS
AR Navigation StudyUsers navigate between Points of InterestThree conditions  AR: Using l  AR U i only an AR viewi  2D-map...
HIT Lab NZ Test Platform – AR View
HIT Lab NZ Platform – Map View
Distance and TimeNo significant differences
Paths Travelled                  Red – AR                  Blue – AR + Map                  Yellow - Map
Navigation Behaviour            Depends on interface            D    d     i t f              Map doesn’t show short      ...
Survey Responses
AR              User Comments “you dont know exactly where you are all of the time.” “    d  k             l h            ...
Usability IssuesScreen readability in sunlightGPS inaccuraciesCompass errorsTouch screen difficultiesNo routing information
Lessons LearnedUser adapt navigation behaviour to guide type  AR interface shows shortcuts  Map interface good for plannin...
Collaboration StudiesRemote ConferencingFace to Face Collaboration
Remote AR ConferencingMoves conferencing from the desktopto the workspace
Pilot StudyHow does AR conferencing differ ? Task    discussing images    12 pairs of subjects  Conditions    audio only (...
Sample Transcript
Transcript AnalysisUsers speak most in Audio Only conditionMR fewest words/min and interruptions/minMore results needed
Presence and Communication           Presence Rating (0-100)100 908070605040302010                                        ...
Subjective CommentsPaid more attention to picturesRemote video provided peripheral cuesIn AR condition  Difficult to see e...
Face to Face CollaborationCompare two person collaboration in:  Face to Face, AR, Projection DisplayTask  Urban design log...
Face to Face ConditionMoving Model Buildings
AR ConditionCards with AR ModelsSVGA AR Display (800x600)Video see-through AR               g
Projection ConditionTracked Input Devices
Task Space Separation
Interface Conditions                        FtF                 AR              ProjectionUser Viewpoint         p       I...
HypothesisCollaboration with AR technology will produce behaviors that are more like natural face-to- face collaboration t...
MetricsSubjective  Evaluative survey after each condition  Forced-choice survey after all conditions  Post experiment inte...
Measured ResultsPerformance  AR collaboration slower than FtF + Projection                                        jCommuni...
Deictic Expressions            30%            25%            20%            15%            10%            5%            0%...
Ease of InteractionSignificant d ffS f         difference      Pick - F(2,69) = 37.8, P < 0.0001      Move - F(2,69) = 28....
Interview Comments“AR’s biggest limit was lack of peripheral vision. The interaction was natural, it was just difficult to...
Face to Face Summary  Collaboration is partly a Perceptual task     AR reduces perceptual cues -> Impacts collaboration   ...
Case Study: A Wearable Information Space          Head Stabilized                              Body StabilizedAnA AR i t f...
Task PerformanceTaskT k   find target icons on 8 pages   remember information spaceConditionsA - head-stabilized pages    ...
Experimental MeasuresObjectiveOb  spatial ability (pre-test)  time to perform task                                Many  in...
Post Experiment SurveyFor each of these conditions please answer:   1) How easy was it to find the target?   1    2       ...
ResultsBody Stabilization Improved Performance   search times significantly faster (One factor ANOVA)Head Tracking Improve...
Subjective Impressions    5  4.5    4  3.5    3                                                Find Target  2.5           ...
Subjective Impressions       3     2.5       2                                                  Easiest     1.5           ...
Conclusions
Key Points• There is a need for more user evaluation of AR  experiences• There are several evaluation approaches that can ...
Resources
Online ResourcesMeta-site for Statistical Analysis  http://home.ubalt.edu/ntsbarsh/stat-data/Topics.htmOnline Statistical ...
BooksJ. Nielsen "Usability Engineering", Academic Press, 1993.H. Sharp, Y. Rogers, J. Preece. “Interaction Design: BeyondH...
COSC 426 Lect. 7: Evaluating AR Applications
COSC 426 Lect. 7: Evaluating AR Applications
Upcoming SlideShare
Loading in...5
×

COSC 426 Lect. 7: Evaluating AR Applications

1,520

Published on

A lecture on evaluating AR interfaces, from the graduate course on Augmented Reality, taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury.

Published in: Technology, News & Politics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,520
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
87
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

COSC 426 Lect. 7: Evaluating AR Applications

  1. 1. Lecture 7: Evaluating AR Applications pp Mark Billinghurst g HIT Lab NZ University of Canterbury
  2. 2. Building Compelling AR ExperiencesB ildi C lli E i experiences Evaluation applications Interaction tools Authoring components Tracking, Display Sony CSL © 2004
  3. 3. Introduction
  4. 4. The Interaction Design Process
  5. 5. The Interaction Design Process
  6. 6. Why Evaluate AR Applications?To test and compare interfaces, new technologies,interaction techniquesTest Usability (learnability, efficiency, satisfaction,...)Get user feedbackRefine interface designBetter d t dB tt undertsand your end users d...
  7. 7. Survey of AR PapersEdward Swan (2005)Surveyed major conference/journals (1992-2004) - P Presence, ISMAR, ISWC, IEEE VR ISMAR ISWCSummary 1104 total papers t t l 266 AR papers 38 AR HCI papers (Interaction) 21 AR user studiesOnlyO l 21 f from 266 AR papers had a formal user study h d f l t d Less than 8% of all AR papers
  8. 8. AR Papers
  9. 9. HIT Lab NZ Usability SurveyA Survey of Evaluation Techniques Used inAugmented Reality Studies Andreas Dünser, Raphaël Grasset, Mark p Billinghurstreviewed publications from 1993and 2007 Extracted 6071 papers which mentioned p p “Augmented Reality” Searched to find 165 AR papers with User Studies
  10. 10. 450400350300250200150100 50 0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 ACM Digital Library SpringerLink IEEE Xplore Journals ScienceDirect SPIE Digital Library InformaWorld MIT Press Journals Highwire Blackwell Synergy Mary Ann Liebert Wiley Interscience Sage Journals Online Emerald Insight Oxford Journals Cambridge Journals Online ASCE Publications JSTOR Karger WorldSciNet BioMed Central ASME Annual Reviews Nature Online MathSciNet National Research Council of Canada Research Press (NRC) AdisOnline APS Journals (PROLA) Royal Society Publishing
  11. 11. Types of User StudiesTypes of AR user studies Perception p User Performance Collaboration Usability of Complete Systems
  12. 12. Types of AR User Studies
  13. 13. Types of Experimental Measures Used Types of Experimental Measures Objective measures Subjective measures Qualitative analysis Usability U b l evaluation techniques l h Informal evaluations
  14. 14. Types of Experimental Measures Used
  15. 15. SummaryOver last 10 years Most user studies focused on user performance p Fewest user studies on collaboration Objective performance measures most used Qualitative and usability measures least used
  16. 16. Types of User Evaluation
  17. 17. What is evaluation?Evaluation is concerned with gatheringdata about the usability of a design orproduct by a specified group of users for aparticular activity within a specifiedenvironment or work context
  18. 18. EvaluationGoal: Measure goodness of the application design ypTwo types: Formative evaluation performed at different stages of development to check that the product meets users’ needs. Summative evaluation assesses the quality of a finished product.Focusing on FF i Formative E l i i Evaluation
  19. 19. When to evaluate?Once the application has been developed pros : rapid development, small evaluation cost cons : rectifying problems redesign & design implementation evaluation reimplementationDuring design and development pros : find and rectify problems early cons : higher evaluation cost, longer development design implementation
  20. 20. Four evaluation paradigms‘quick and dirty’ q yusability testing (lab studies)field studiespredictive evaluation
  21. 21. Quick and dirty‘quick & dirty’ evaluation: informal feedback fromusers or consultants to confirm that their ideas are in- inline with users’ needs and are liked.Quick & dirty evaluations are done any time time.Emphasis is on fast input to the design process ratherthanth carefully d f ll documented fi di t d findings.
  22. 22. Usability TestingRecording typical users’ performance on typical tasks incontrolled settings. Field observations may be used. g yAs the users perform these tasks they are watched & recordedon video & their inputs are logged.This data is used to calculate performance times, errors & helpexplain why the users did what they did.User satisfaction questionnaires & interviews are used to elicitusers’ opinions.
  23. 23. Laboratory-based StudiesLaboratory-based studies can be used for evaluating the design, or the design implemented system are carried out in an interruption-free usability lab can accurately record some work situations some studies are only possible in a lab environment di l ibl i l b i some tasks can be adequately performed in a lab are useful for comparing different designs in a controlled context
  24. 24. Laboratory-based StudiesControlled, instrumented environment
  25. 25. Field StudiesField studies are done in natural settingsThe aim is to understand what users do naturally and yhow technology impacts them.In product design field studies can be used to: design,- identify opportunities for new technology- determine design requirements- decide how to introduce new technology- evaluate technology in use use.
  26. 26. Predictive EvaluationExperts apply their knowledge of typical users,guided by heuristics, to predict usability problems.Can involve theoretically based models.A key feature of predictive evaluation is that real k f t f di ti l ti i th t lend users need not be presentRelatively quick and inexpensive
  27. 27. Characteristics of Approaches Usability Field studies Predictive testingUsersU do k d task natural l not involved l dLocation controlled natural anywhereWhen prototype early prototypeData quantitative qualitative problemsFeed back measures & descriptions problems errorsType applied naturalistic expert
  28. 28. Evaluation Approaches and MethodsMethod Usability Field studies Predictive testingObservingOb i x xAsking users x xAsking x xexpertsTesting xModeling x
  29. 29. DECIDE: A framework to guide evaluation- Determine the goals the evaluation addresses.- Explore the specific questions to be answered.- Choose the evaluation p di and t h i Ch th l ti paradigm d techniques- Identify the practical issues.- Decide how to deal with the ethical issues.- Evaluate, interpret and present the data.
  30. 30. DECIDE FrameworkDetermine Goals:D G l What are the high-level goals of the evaluation? How wants the evaluation and why?Explore the Questions: Create well defined, relevant questions qChoose the Evaluation Paradigm Influences the techniques used, how data is analyzedIdentify Practical Issues How to select users, stay on budget & schedule How to find evaluators select equipment evaluators,
  31. 31. DECIDE FrameworkDecide on Ethical Issues Informed consent form Participants have a right to: -kknow th goals of th study and what will h the l f the t d d h t ill happen to the fi di t th findings - privacy of personal informationEvaluate, Interpret and Present Data , p- Reliability: can the study be replicated?- Validity: is it measuring what you thought? y g y g- Biases: is the process creating biases?- Scope: can the findings be generalized?- E l i l validity: is the environment influencing the results? Ecological lidit i th i t i fl i th lt ?
  32. 32. Usability Testing
  33. 33. Pilot StudiesA small trial run of the main study. Can identify majority of issues with interface designPilot studies check:- that the evaluation plan is viable p- you can conduct the procedure- that interview scripts, questionnaires, experiments, etc. workappropriatelyIron out problems before doing the main study.
  34. 34. Controlled experimentsDesigner of a controlled experiment should carefullyconsider proposed hypothesis selected subjects measured variables experimental methods data ll i d collection data analysis
  35. 35. Variables V i blExperiments manipulate and measure variables undercontrolled conditionsThere are two types of variables independent: variables that are manipulated to create different experimental conditions - e.g. number of items in menus, colour of the icons dependent: variables that are measured to find out the effects of changing the independent variables - e.g. speed of menu selection, speed of locating iconsTest Conditions The levels, values, or settings for an independent variable Example E l - test conditions: HMD, Handheld device 1, Handheld device 2
  36. 36. “Other” VariablesControl variables e.g. room light, noise… if controlled => less external validityRandom variables (not controlled) e.g. fatigue more influence of random variable => less internal validityConfounding variables p practice previous experience
  37. 37. HypothesisA hypothesis is a prediction of the outcome what will happen to the dependent variables when the independent variables are changed to show that the prediction is right - d dependant variables don’t change by changing d t i bl d ’t h b h i the independent variables - rejecting the null hypothesis (H0 ) j g yp (
  38. 38. Experimental methodsIt is important to select the right experimentalmethod so that the results of the experimentcan be generalizedThere are mainly two experimental methods y p between-groups: each subject is assigned to one experimental condition within-groups: each subject performs under all the different conditions
  39. 39. Experimental methodsBetween-groups g p Within-groups g p Subjects Subjects Randomly Randomly assigned assigned Condition Condition Condition rimental tasks rimental tasks rimental tasks erimental task 1 2 3 Condition Condition Condition Condition Condition Condition 1 2 3 2 1 1 Expe Exper Exper Exper Condition Condition Condition 3 3 2 data data data data data data Statistical data analysis Statistical data analysis
  40. 40. Within vs. Between Subjectsbetween subjects design each participant is tested on only one level/condition a separate group of participants is used for each condition - one group uses HMD other group uses Handheld devicewithin subjects design participant is tested on each level/condition - e.g. participants use Handheld device and HMD repeated measurement
  41. 41. Between SubjectsSometimes a factor must be between subjects e.g. gender, age, experienceBetween subjects advantage: avoids interference effects (e.g. practice / learning effect)Between subjects disadvantage: Increased variability = need more subjects y jImportant: randomised assignment to conditions
  42. 42. Within SubjectsSometimes a factor must be within subjects e.g. measuring learning effectsWithin subjects advantages less participants needed (all p p p ( participants in all conditions) p ) differences (variability) between subjects the same across test conditionsCounterbalance order of presenting conditions A => B => C B => C => A C => A => BThe order is best governed by a Latin Square
  43. 43. Latin Square Designeach condition occurs once in each row and columnNote: In a balanced Latin Square each condition bothprecedes and follows each other condition an equal d d f ll h th diti lnumber of times
  44. 44. SubjectsThe hTh choice of subjects is critical to the validity of the f b l h ld f hresults of an experiment subjects group should b representative of th bj t h ld be t ti f the expected user populationIn selecting the subjects it is important to considerthings such as their age group, education, skills, culture g g p How does the sample influence the results?Report the selection criteria and give relevantdemographic information in your publication
  45. 45. SubjectsHow many participants?H ? How big is the effect you want to measure? - l large effects can be detected with smaller samples ff b d d ih ll l - e.g. small n needed to discriminate speed between turtles and a rabbits The more participants the “smoother” the data p p - Central Limit Theorem - as n increases (n>30) the sample mean approaches a normal distribution - extreme data has less influence (e g one sleepy participants does not (e.g. mess up the results that much)for quantitative analysis: rule of thumb MINIMUM q y15-20 or more per group/cell
  46. 46. Data Collection and AnalysisThe choice of a method is dependent on the type ofdata hd that needs to be collected d b ll dIn order to test a hypothesis the data has to beanalysed using a statistical method l d l h dThe choice of a statistical method depends onthe type of collected data All the decisions about an experiment should be made before it is carried out
  47. 47. Observe and MeasureObservations are gathered… manually (human observers) automatically (computers, software, cameras, sensors, etc.)A measurement is a recorded observationObjective metrics jSubjective metrics
  48. 48. Typical objective metricstask completion time k l i ierrors (number, percent,…)percent of task completedratio of successes to failuresnumber of repetitionsnumber of commands usednumber of failed commandsphysiological data (heart rate,…)…
  49. 49. Typical subjective metricsuser satisfactionsubjective p j performanceratingsease of useintuitivenessjudgments…
  50. 50. Data TypesSubjective Subjective survey How easy was the task - Likert Scale, condition rankings 1 2 3 4 5 Observations Not very easy Very easy - Think Aloud Interview responsesObjective Performance measures e o a ce easu es - Time, accuracy, errors Process measures - Vid / di analysis Video/audio l i
  51. 51. Experimental Measures E erimental Meas res Measure What does it tell us? How is it measured?Timings Performance Via a stopwatch, or automatically by the device.Errors Performance, Particular sticking points in a task By success in completing the task correctly. Through experimenter observation, examining the route walked.Perceived Workload Effort invested. User satisfaction Through NASA TLX scales and other questionnaires. i iDistance traveled and route Depending on the application, these can be used Using a pedometer, GPS or othertaken to pinpoint errors and to indicate performance location-sensing system. By experimenter observation.Percentage preferred walking Performance By finding average walking speed,speed which is compared with normal walking speed.Comfort User satisfaction. Device acceptability Comfort Rating Scale and other questionnaires.User comments and User satisfaction and preferences. Particular Through questionnaires, interviews andpreferences sticking points in a task. think alouds. think-alouds.Experimenter observations Different aspects, depending on the experimenter Through observation and note-taking and on the observations
  52. 52. Statistical AnalysisOnce data is collected statistics can be used for analysisTypical Statistical Techniques yp q Comparing between two results - Unpaired T-Test (for between subjects – assumes normal distribution, interval scale, h l homogeneity of variances) it f i ) - Paired T-Test (for within subjects – assumes normal distribution, etc.) - Mann–Whitney U-test (between subjects – if assumptions are not met) Comparing between > two results - Analysis of Variance – ANOVA - F ll Followed b post-hoc analysis – B f d by th l i Bonferroni adjustment i dj t t - Kruskal–Wallis (does not assume normal distribution)
  53. 53. Running the studyOffloadOffl d your B Brain! ! Write down instructions prepare checklists h kli t create templates print and pitch important informationTry and find an assistantPrint questionnaires and otherdocuments the day beforeRehearse procedures procedures. - 4 kg in 2 weeksBring your lunch – don’t forget to eat
  54. 54. Running the studyTreat the participants nicelyPrepare candy and drinks and make them feel good. p y gTake the role of a friendly waiter: Always stay in background but offer assistance if needed.Take notes, document oddities.Nothing is as bad as lost data!! AVOID AVOID AVOID
  55. 55. Running the studyTake many photos of your setup in action.Prepare consent forms if y want to use pictures p you pfor publications.
  56. 56. Field Studies
  57. 57. Field S d F ld Studies Field studies are done in natural settings settings. “in the wild” is a term for prototypes being used freely in natural settings settings. Aim to understand what users do naturally and how technology impacts them them. Field studies are used in product design to: - identify opportunities for new technology; - determine design requirements; - decide how best to introduce new technology;gy; - evaluate technology in use.59 www.id-book.com
  58. 58. ObservationDirect observation i the fi ldDi b i in h field Structuring frameworks Degree of participation (insider or outsider) EthnographyDirect observation in controlled environmentsIndirect observation: tracking users’ activities Diaries Interaction logging
  59. 59. Ethnography• Ethnography is a philosophy with a set of techniques that include participant observation and interviews• Ethnographers immerse themselves in the culture studied • Need cooperation of people being studied• A researcher’s degree of participation can vary along a scale from ‘outside’ to ‘inside’• A l i video and d l Analyzing id d data logs can b time-consuming be i i • Can use continuous data analysis• Collections of comments, incidents and artifacts are made comments incidents,
  60. 60. Direct observation in a controlled setting g Think-aloud techniqueIndirect observation Diaries Interaction logs Cultural probes
  61. 61. Structuring frameworks to guide observation - The person. Who? - The place. Where? p - The thing. What? The Goetz and LeCompte (1984) framework: - Who is present? - What is their role? - What is happening? - Where is it happening? - Why is it happening? - How is the activity organized?
  62. 62. Predictive Evaluation
  63. 63. Predictive ModelsProvide a way of evaluating products or designswithout directly involving users.Less expensive than user testing.Usefulness limited to systems with predictable tasks e.g., telephone answering systems, mobiles, etc.Based on expert error-free behavior behavior.
  64. 64. Fitts’ Law (Fitts, 1954)Fitts’ Law predicts that the time to point at an objectusing a device is a function of the distance from the targetobject and the object’s size.The further away and the smaller the object, the longerthe time to locate it and point to it. h l d
  65. 65. GOMS ModelGoals hG l - the state the user wants to achieve e.g., find a h hi fi dwebsite.Operators - the cognitive processes and physical actionsneeded to attain the goals Eg moving mouse to select icon g gMethods - the procedures for accomplishing the goals, e.g.,drag mouse over icon, click on button.Selection rules - decide which method to select when there ismore than one.
  66. 66. GOMS Response Times (Card et al., 1983) Operator Description Time (sec) K Pressing a single key or button g g y Average skilled typist (55 wpm) 0.22 Average non-skilled typist (40 wpm) 0.28 Pressing shift or control key 0.08 Typist unfamiliar withthekeyboard with the keyboard 1.20 120 P Pointing with a mouse or other device on a 0.40 display to select an object. This value is derived fromFitts’ Law which is discussed below. P1 Clicking the mouse or similar device 0.20 H Bring ‘home’ hands on the keyboard or other 0.40 device M Mentally prepare/respond 1.35 R(t) The response time is counted only if it causes t the user to wait.
  67. 67. Expert InspectionsSeveral kindsExperts use their knowledge of users and technology toreview application usability.Expert critiques can be formal or informal reports.HeuristicH i ti evaluation i a review guided b a set of heuristics l ti is i id d by t f h i ti Eg: Visibility of system status Jacob Nielsen s heuristics (1990s) Nielsen’sWalkthroughs involve stepping through a pre-plannedscenario noting potential problems Eg load AR model, scale it twice the size, add new model, etc
  68. 68. Nielsen’s heuristicsVisibility of system status status.Match between system and real world.User control and freedom freedom.Consistency and standards.Error prevention.ERecognition rather than recall.Flexibility and efficiency of use.Aesthetic and minimalist design.gHelp users recognize, diagnose, recover from errors.Help and documentation.
  69. 69. Three Stages for Doing Heuristic Evaluation 1/ Briefing session to tell experts what to do. 2/ Evaluation period of 1 2 h E l i i d f 1-2 hours in which: i hi h Each expert works separately; Take one pass to get a feel for the product; Take a second pass to focus on specific features. 3/ Debriefing session in which experts work together to prioritize problems.
  70. 70. No. of evaluators & problems
  71. 71. Advantages and ProblemsFew ethical and practical issues to consider because usersnot involved.Can be difficult and expensive to find experts.Best experts have knowledge of application domain andusers.Biggest problems: Important problems may get missed; Many trivial problems are often identified; Experts have biases.
  72. 72. Case Studies
  73. 73. Types of AR ExperimentsPerception How is virtual content perceived ? p What perceptual cues are most important ?Interaction How can users interact with virtual content ? Which interaction techniques are most efficient ?Collaboration How is collaboration in AR interface different ? Which collaborative cues can be conveyed best ?
  74. 74. PerceptionCentral goal of AR systems is to fool the human perceptual system Display Modes Di l M d Direct View Stereo Video Stereo graphics Multi-modal display Different objects with different display modes Potential for depth cue conflict p
  75. 75. Perceptual User StudiesDepth / Distance Studies Estimate distance to object Judge relative proximityObject localization j Match physical and virtual object positionsDifficultiesDiffi lti Precise alignment / calibration of displays Lag in head tracking ( L i h d t ki (use static i t ti images) )
  76. 76. Layar – www.layar.com
  77. 77. Outdoor AR: Limited Field of View
  78. 78. Possible l iP ibl solutions Overview + Detail spatial separation; two views Focus + Context merges both views into one view Zooming temporal separation
  79. 79. Zooming ViZ i ViewsTU G Graz – HIT Lab NZ - collaboration L b ll b ti Zooming panorama Zooming M Z i Map
  80. 80. Zooming AR interfaces Z i i fContext CompassContext Compass Zooming Panorama Zooming Panorama Zooming Map Zooming Map Interface Types Compass (C) C Compass + Zooming Panorama (CP) Compass + Zooming Map (CM) p g p( ) Compass, Zooming Panorama, Zooming Map (CPM)
  81. 81. Experiment Evaluation20 subjects (10 M/ 10 F)Café finding task g Task 1: Find particular café named “Alpha” Task 2: Find closest caféExperiment measures Time to complete task Angular distance panned around Subjective survey feedback j y
  82. 82. Performance Time
  83. 83. Distance Panned
  84. 84. ResultsCompass good for search, but not comparisonZooming (P or M) aids comparison g( ) pInformation has significant effectCompass requires more panningUser felt compass alone wasn’t useful
  85. 85. Interaction StudiesStages of Interface Development• Prototype Demonstration• Adoption of Interaction techniques from other interface metaphors• Development of new interface metaphors appropriate to the medium• Development of formal theoretical models for predicting and modeling user interactions
  86. 86. Fitt’s Law (1964)Relates Movement Time to Index of Difficulty MT = a + b log2(2A/W) where log2(2A/W) = ID Robust under most circumstances object tracking, tapping tasks, movement tasks tracking tasks
  87. 87. Interaction Study - ReachingMason, A. et. al. (2001). Reaching Movements to Augmented and GraphicObjects in Virtual Environments. Proc. CHI 2001. Does Fitt’s Law hold in an acquisition t k? D Fitt’ L h ld i i iti task? Does Fitt’s Law hold when reaching for virtual objects ? Does Fitt’s L h ld when you can’t see your h d ? D F ’ Law hold h ’ hand
  88. 88. Experimental SetupEnhanced Virtual Hand LabHalf Silvered MirrorShutter GlassesOPTOTRAK optical tracker p IREDs worn on wrist, objectFour target cubes gConditions: Cube size arm visibility, real/virtual objects size, visibility
  89. 89. Kinematic MeasuresMovement TimePeak Velocity of Wrist yTime to Peak Velocity of the WristPercent Time from Peak Velocity of the Wrist
  90. 90. Results – Movement Time
  91. 91. Results – Velocity Profiles
  92. 92. AR NavigationMany commercial AR browsers Information in place How to navigate to POI
  93. 93. 2D vs. AR Navigation? VS
  94. 94. AR Navigation StudyUsers navigate between Points of InterestThree conditions AR: Using l AR U i only an AR viewi 2D-map: Using only a top down 2D map view AR+2D-map: Using both an AR and 2D map viewExperiment Measures Quantitative - Time taken, Distance travelled Qualitative - Experimenter observations, Navigation behavior, Interviews - U User surveys, workload (NASA TLX) kl d
  95. 95. HIT Lab NZ Test Platform – AR View
  96. 96. HIT Lab NZ Platform – Map View
  97. 97. Distance and TimeNo significant differences
  98. 98. Paths Travelled Red – AR Blue – AR + Map Yellow - Map
  99. 99. Navigation Behaviour Depends on interface D d i t f Map doesn’t show short cuts
  100. 100. Survey Responses
  101. 101. AR User Comments “you dont know exactly where you are all of the time.” “ d k l h ll f h i ” “using AR I found it difficult to see where I was going”MapM “you were able to get a sense of where you were” “you are actually able t see the physicall objects around you” “ t ll bl to th ph i bj t d ”AR+MAP “I used the map at the b i i to understand where the d th p t th beginning t d t d h th buildings were and the AR between each point” “You can choose a direction with AR and find the shortest way You using the map.”
  102. 102. Usability IssuesScreen readability in sunlightGPS inaccuraciesCompass errorsTouch screen difficultiesNo routing information
  103. 103. Lessons LearnedUser adapt navigation behaviour to guide type AR interface shows shortcuts Map interface good for planningInclude map view in AR interface 2D exocentric, and 3D egocentricAllow people to easily change between views p p y g May use Map far away, AR closeDifficult to accurately show depth y p
  104. 104. Collaboration StudiesRemote ConferencingFace to Face Collaboration
  105. 105. Remote AR ConferencingMoves conferencing from the desktopto the workspace
  106. 106. Pilot StudyHow does AR conferencing differ ? Task discussing images 12 pairs of subjects Conditions audio only ( ) y (AC) video conferencing (VC) mixed reality conferencing (MR)
  107. 107. Sample Transcript
  108. 108. Transcript AnalysisUsers speak most in Audio Only conditionMR fewest words/min and interruptions/minMore results needed
  109. 109. Presence and Communication Presence Rating (0-100)100 908070605040302010 Could tell when Partner was Concentrating 0 14 AC VC MR 12 10 8 6 4 2 0 AC VC MR
  110. 110. Subjective CommentsPaid more attention to picturesRemote video provided peripheral cuesIn AR condition Difficult to see everything Remote user distracting Communication asymmetries
  111. 111. Face to Face CollaborationCompare two person collaboration in: Face to Face, AR, Projection DisplayTask Urban design logic puzzle - Arrange 9 building to satisfy 10 rules in 7 minutesSubjects Within subjects study (counter-balanced) 12 pairs of college students
  112. 112. Face to Face ConditionMoving Model Buildings
  113. 113. AR ConditionCards with AR ModelsSVGA AR Display (800x600)Video see-through AR g
  114. 114. Projection ConditionTracked Input Devices
  115. 115. Task Space Separation
  116. 116. Interface Conditions FtF AR ProjectionUser Viewpoint p Independent Private Public Easy to change Independent Common Easy to change Difficult to change Limited FOVInteraction Two handed Two handed Mouse-based Natural object Tangible AR One-handed manipulation techniques Time-multiplexed Space-multiplexed Space-multiplexed
  117. 117. HypothesisCollaboration with AR technology will produce behaviors that are more like natural face-to- face collaboration than from using a screen- screen based interface.
  118. 118. MetricsSubjective Evaluative survey after each condition Forced-choice survey after all conditions Post experiment interviewObjective j Communication measures - Video transcription p
  119. 119. Measured ResultsPerformance AR collaboration slower than FtF + Projection jCommunication Pointing/Picking gesture behaviors same in AR as FtF Deictic speech patterns same in AR as FtF - Both significantly different than Projection condition g y jSubjective FtF easier to work together and understand Interaction in AR easier than Proj. and same as FtF
  120. 120. Deictic Expressions 30% 25% 20% 15% 10% 5% 0% FtF Proj ARSignificant difference – ANOVA, F(2,33) = 5.77, P < 0.01No difference between FtF and AR
  121. 121. Ease of InteractionSignificant d ffS f difference Pick - F(2,69) = 37.8, P < 0.0001 Move - F(2,69) = 28.4, P < 0.0001
  122. 122. Interview Comments“AR’s biggest limit was lack of peripheral vision. The interaction was natural, it was just difficult to see. In the projection condition you could see everything but the interaction was tough” Face to Face Subjects focused on task space - gestures easy to see gaze difficult see, Projection display Interaction difficult (8/14) - not mouse-like, invasion of space AR display – “working solo together” Lack of peripheral cues = “tunnel vision (10/14 people) tunnel vision”
  123. 123. Face to Face Summary Collaboration is partly a Perceptual task AR reduces perceptual cues -> Impacts collaboration Tangible AR metaphor enhances ease of interaction Users felt that AR collaboration different from FtF But: measured speech and gesture behaviors in AR condition is more similar to FtF condition than in Projection displayThus we need to design AR interfaces that don’t reduce perceptual cues, while k h l keeping ease of interaction f
  124. 124. Case Study: A Wearable Information Space Head Stabilized Body StabilizedAnA AR i t f interface provides spatial audio and visual cues id ti l di d i lDoes a spatial interface aid performance? –Task time / accuracyM. Billinghurst, J. BowskilE, Nick DyeE, Jason Morphett (1998). An Evaluation of Wearable InformationSpaces. Proc. Virtual Reality Annual International Symposium.
  125. 125. Task PerformanceTaskT k find target icons on 8 pages remember information spaceConditionsA - head-stabilized pages head stabilizedB - cylindrical display with trackballC - cylindrical display with head trackingSubjects Within subjects (need fewer subjects) 12 subjects used
  126. 126. Experimental MeasuresObjectiveOb spatial ability (pre-test) time to perform task Many information recall workload (NASA TLX) DifferentSubjective Measures Post Experiment Survey - rank conditions (forced choice) - Likert Scale Questions • “How intuitive was the interface to use?”
  127. 127. Post Experiment SurveyFor each of these conditions please answer: 1) How easy was it to find the target? 1 2 3 4 5 6 7 1=not very easy 7=very easyFor the head stabilised condition (A):For the cylindrical condition with mouse input (B):For the head tracked condition (C):Rank all the conditions in order on a scale of one to three1) Which condition was easiest to find target (1 = easiest, 3 = hardest) A: B: C:
  128. 128. ResultsBody Stabilization Improved Performance search times significantly faster (One factor ANOVA)Head Tracking Improved Information recall no difference between trackball and stack caseHead tracking involved more physical work
  129. 129. Subjective Impressions 5 4.5 4 3.5 3 Find Target 2.5 Enjoyable 2 1.5 15 1 0.5 0 A B CSubjects Felt Spatialized Conditions ( j p (ANOVA): ) More enjoyable Easier to find target
  130. 130. Subjective Impressions 3 2.5 2 Easiest 1.5 Understanding Intuitive 1 0.5 0 A B CSubject Rankings (Kruskal-Wallis) Spatialized S ti li d easier t use th h d stabilized i to than head t bili d Body stabilized gave better understanding Head tracking most intuitive g
  131. 131. Conclusions
  132. 132. Key Points• There is a need for more user evaluation of AR experiences• There are several evaluation approaches that can be used • ‘quick and dirty’ q y • usability testing (lab studies) • field stu es e studies • predictive evaluation• Studies should use multiple qualitative and quantitative experimental measures.
  133. 133. Resources
  134. 134. Online ResourcesMeta-site for Statistical Analysis http://home.ubalt.edu/ntsbarsh/stat-data/Topics.htmOnline Statistical Analysis http://www.quantitativeskills.com/sisa/Experiment Design http://en.wikipedia.org/wiki/Design_of_experiments p p g g _ _ p http://www.curiouscat.net/library/designofexperiments.cfm
  135. 135. BooksJ. Nielsen "Usability Engineering", Academic Press, 1993.H. Sharp, Y. Rogers, J. Preece. “Interaction Design: BeyondHuman-computer IH Interaction”, J h Wil & S i ” John Wiley Sons, 2007J. Spool, J. Rubin, D. Chisnell. “Handbook of Usability Testing:How to Plan Design, and Conduct Effective Tests”, John Plan, Design TestsWiley & Sons, 2008T. Tullis, B Albert. MeasuringT Tullis B. Albert “Measuring the User Experience:Collecting, Analyzing, and Presenting Usability Metrics”,Morgan Kaufmann , 2008 gA. Field, G. Hole. “How to Design and Report Experiments”,Sage Publications Ltd, 2003
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×