UX by the numbers: The meaning and value of numbers in UX
Upcoming SlideShare
Loading in...5
×
 

UX by the numbers: The meaning and value of numbers in UX

on

  • 568 views

Dr Simone Stumpf from City University ...

Dr Simone Stumpf from City University

Quantitative analysis and summative statistics can be powerful tools in UX but their use needs to be carefully considered. Quantitative results are not context-free – a number may be the answer to the wrong question. Much more important than understanding the answer is understanding the question in order to choose the right method to capture and analyse quantitative data.

Statistics

Views

Total Views
568
Slideshare-icon Views on SlideShare
554
Embed Views
14

Actions

Likes
0
Downloads
2
Comments
0

2 Embeds 14

http://www.pinterest.com 8
https://twitter.com 6

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Also known as…
  • Also run MSc course in Human-Centred Systems at City University London.-> Lots of experience with numbers and quantitative data, both in industry and in academia.
  • Hands up!
  • Combined this has wide reaching implications in UX. Firstly when you ask users about estimating something (e.g. user experiences) which involves frequency, numeric ratings or anything that looks like rational decision-making you’ll run into trouble.Secondly, researchers and UX professionals are equally likely to fall into the trap of cognitive bias and unsound methods. For example, the decision of how many users to recruit to a study quite often depends on practicalities than sampling of the population and statistical power. Usually, the answer may be “well, let’s get more than 8 and we’ll see if we get a good result”
  • So let’s talk about quanttative approaches in UX. There is quantitative data which is anything you can count or turn into frequencies or measure as a number. There is also quantitative analysis which usually means some kind of statistics. For example, a t-test, or a Chi-squared or a Fisher’s exact or a Kruskall-Wallis. Now, I’m just showing off…What’s important for statistics is that you have to have a hypothesis, usually a Null hypothesis you reject and an H1 hypothesis which is what you want to accept because you’ve shown a statistically significant difference. (I won’t go into the details of what statistical difference is but suffice to say you are looking for a pretty drastic effect). A hypothesis is usually a question with a yes or no answer.
  • Let’s look at some questions that are very commonly asked in UX… let’s go through each of these and see if they use Quantitative data or quantitative analysis or both.
  • Next I will go through 3 examples of quantitative approaches I’ve encountered to answer questions in UX. I’ll discuss some limitations of what I’ve seen and also offer some possible solutions.
  • Firstly, as we have seen before using a number can be suspect. It could be prone to anchoring.From experience, if this is asked after user test, the average will be around 3.5. Unless they really really loathe it or really really love it, it’ll always be around 3.5. Why 3.5 and not 3? 0.5 is for liking the facilitator and your incentives.Even if the numbers were at all meaningful, there are various downfalls – low sample size, data that is not normally distributed. Usually, researchers ignore these things for convenience.
  • In research, the NASA-TLX is a very popular instrument to measure users’ satisfaction with an interface, to carry out a task. It has 6 dimensions and these are measured on a 20-point scale, without using numbers. It’s very easy to apply and because so many other people have used it, it’s become pretty standard.
  • However, even this is prone to problems. Without a comparison to a different design, the number again becomes fairly meaningless. Even on a user interface that I consider really bad, the feedback was still all in the middle.The problem is also that it is really hard to get a good enough effect, even with a decent enough sample size. In this study, even though all the other softer responses were telling us With-scaffolding design was better, the TLX still come out not statistically significant.So, what’s the solution here? One perspective would suggest that we shouldn’t measure subjective satisfaction using a quantitative approach at all. However, I still hold it can be valuable to quantify the user experience but you will need to be careful to limit your expectations of what you can get as an answer to your question.
  • Whenever I read “some” or “many” in a student report my heart sinks. Give me some indication of whether this is prevalent or not, and also what your base sample is so I can interpret it properly.
  • Here is an example of analysing the user experience of two different user interface environments – one mainly textual, the other mainly visual, in a quantitative form. We used an adapted form of the Microsoft desirability toolkit – the Procduct Reaction cards and we were able to show the frequency of how often certain words occurred, in this case in a visual form through word clouds where the size of the words corresponds to its relative frequency. And then another trick, we counted how many words selected were positive and how many negative. This gives us an insight on what is going on, even without resorting to stats.
  • Finally, this is a common question, and one which is relatively tractable if you have very few changes.
  • Often touted as the answer and the best way to test the user experience. Partly, it seductive because you have these pretty pictures to look at and include in a client report. However, be clear that these pictures can at best give you an intuitive answer. Once you delve down into the numbers themselves you again need to be pretty careful. On their own, an average fixation duration of 0.54 seconds is without context. Is this good or bad? Who knows?! These studies are complex to set up – you have to have an “area of interest” and any questions you have need to be related back to these.
  • In a recent eyetracking study we carried out, luckily we knew our areas of interest right from the start and we were able to compare three versions. But the data analysis still turned out to be more complex than first anticipated – we had a low sample size, to get some of the data to try to do some stats we actually had to get the raw eye movement data and calculate from there, and also we were dealing with dynamic content not just static images.

UX by the numbers: The meaning and value of numbers in UX UX by the numbers: The meaning and value of numbers in UX Presentation Transcript

  • The meaning and value of numbers in UX. Dr Simone Stumpf Centre @DrSimoneStumpf Simone.Stumpf.1@city.ac.uk for HCI Design
  • Everyone loves numbers.
  • My background. Academia University College London – BSc Computer Science with Cognitive Science – PhD Computer Science – Research Fellow Oregon State University – Research Manager City University London – Senior Lecturer Industry BT – – – – Fraud detection Product management Marketing Project management White Horse – UX Architect
  • That makes me years old. 3876
  • How old was Methuselah when he died? 670 969 1254 2756
  • Cognitive bias and heuristics. Anchoring – any number has a priming effect on number estimates. People, even researchers, are bad at probability, predictions and statistics. [Daniel Kahneman – Thinking, Fast and Slow]
  • Quantitative approaches in UX. Quantitative data – numbers. Quantitative analysis – statistics. For statistical tests you have a hypothesis.
  • Quantitative data and/or analysis? How many problems does a user have using my snazzy new design? What kind of problems does a user have using my snazzy new design? Do you like my snazzy new design? Is this snazzy new design better than the old boring design?
  • Quantitative 3 approaches.
  • Do you like my snazzy new design?
  • Let’s ask the user. How much do you like the design on a scale of 1 to 5 (where 5 is best). Average of ratings across all users. Then, er, do some stats?
  • Way around? NASA Task Load Index (TLX) to assess user’s perceptions of – – – – – – Mental Demand Physical Demand Temporal Demand Performance Effort Frustration
  • Mea culpa! “Responses to TLX questions (Mental Demand, Temporal Demand, Success of Performance, Effort, Frustration) were all around the mid-point of the scale.” On an interface which was truly hateful! “However, the [Condition 1] participants showed no significant difference to [Condition 2] participants’ TLX scores.” – 62 participants At least our sample size wasn’t shabby and we did some stats.
  • What kinds of problems does a user have with my snazzy new design?
  • Hold on – is that a trick question? Surely, that’s qualitative analysis! Yes, but no. It starts out that way but then I expect frequencies to back this up. No stats though, thanks. “4 out of 5 users could not find the Purchase button.”
  • Count them! Textual environment 62 positive – 69 negative Visual environment 101 positive – 37 negative 16
  • Is this new design better than my old design?
  • Easy-peasy. I’ll use a between-subject design using objective measures. Like…eye tracking! What could be more objective than where people look.
  • Lots of numbers – First Fixation Duration, Fixation Duration, Time to First Fixation, … [http://uxmag.com/articles/eye-tracking-the-best-way-to-test-rich-app-usability]
  • Well, that was fun. There was a highly significant difference in the number of fixations between versions (Χ2(2,N=4257)=22.25, p<0.001). Each participant on average fixated 240.83 times in version 1, 259.83 times in version 2 yet only 209.33 times in version 3. The average fixation duration between versions was also different (ANOVA, F(2,4257)=13.30, p=<0.001), with participants in version 1 spending on average 0.57 seconds per fixation, 0.56 seconds in version 2 but 0.69 seconds in version 3. Yay – we did stats! There were results! The total fixation duration is the sum of all individual fixation’s durations. There was no statistical significance between participants’ length of total fixations (Kruskal-Wallis, H(2,N=18), p=0.236). Oh bum…
  • To summarise. Try and quantify as much as possible but be clear about limitations of what you can measure. Descriptive statistics are good but are relatively meaningless without context. If you must use statistical tests, please make sure they are appropriate.
  • Be clear about your Numbers are questions and the best way awesome. to answer them. @DrSimoneStumpf Simone.Stumpf.1@city.ac.uk