'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction

Ed H. Chi 
Area Manager and Sr. Research Scientist 
Palo Alto Research Center 

2009 HCIC Workshop at YMCA Camp, Colorado

As a ﬁeld, early fundamental contributions from: 
 

–  Computer scientists interested in changes in ways we interact
 with information systems 
–  Psychologists interested in the implications of these changes 
Combustible, because: 
 

–  Computer scientists want to create great tools, but didn’t know
 how to measure impact 
–  Psychologists want to go beyond classical research of the brain
 and human cognition 
Example:  
 

–  an enduring interest in ‘augmenting human intellect’: V. Bush,
 Licklider, Engelbart, in turn inspiring Stu Card, Alan Newell,
 Alan Kay, and many others. 

2/13/09 HCIC quot;Living Labquot; 2

The need to establish HCI as a science 
 

–  Adopt methods from psychology 
–  Convenient and ﬁts well with problems at hand 
  Issues around personal computing (WIMP) 

–  Dual purpose: understand nature of human 
behavior and build up a science of HCI 
techniques. 
–  Good Examples: Fitts’ Law, Models of Human 
Memory, Cognitive and Behavioral Modeling, 
Information Foraging 

–  Stuart K. Card, William K. English, and Betty J. Burr (1978). 
Evaluation of mouse, rate‐controlled isometric joystick, 
step keys, and text keys for text selection on a 
CRT. Ergonomics, 21(8):601–613, 1978. 

Beyond a user in front of computer 
 

–  Yet evaluation methods mostly stayed the same 
–  Perceived CHI paper template for acceptance 
Many problems don’t ﬁt the laboratory experimental 
 

methods anymore 
–  Yesterday’s discussion about Large Data and HCI was largely driven by 
how HCI evaluation methods need to change to ﬁt the wild 
The best examples:  Trends in Social Computing and UbiComp force 
 

us to think about new context of use 


Old Assumptions New Considerations
Single display Multiple displays
Knowledge work Games, communication, social apps
Isolated worker Collaborative and social groups
Stationary location Mobile and stationary
Short task durations Short and long tasks, and tasks with
no time boundries
Controllable experimental conditions Uncontrollable experimental conditions


Think about research that has been done on UI
 

 animations or ﬂashing icons. 
From visual perception, we know motion in the
 
 periphery is more noticeable than in the foveal region
 [DaVinci]. 


Evaluations surrounding many HCI systems for
 

 knowledge work focus on productivity increase, but
 what about factors for adoption? 
–  Argument: if no productivity increase, then adoption is
 irrelevant 
–  But the opposite argument is just as right: if no adoption, no
 amount of productivity increase shown is relevant! 
Academic research often focus on productivity
 

 improvements, increasing the perceived gulf between
 the ivory tower and the trenches 
An Example: Color copier studies 
 


Artiﬁcial experimental setups are only capable of telling us 
 

behaviors in constrained situations 
–  Ecological considerations 
–  Hard to generalize to new task contexts (with interruptions, other 
tasks, other goals, unfocused attention, more displays) 
–  Hard to generalize to other tools, apps 
–  Impossible to answer questions about aggregate behaviors of groups 
Example problems:  
 

–  Adoption of mobile technology 
  iPhones in Japan, single‐handed input [PARC] 

  Best selling phones in Indonesia comes with a compass [Bell] 

–  Aggregate behavior of Wikipedia or Delicious users 
  Big data analysis of edit logs 


Was a computational molecular biologist 
 

Analogy: Just as biologists work on model plants and
 
 genomes in the lab, this tells us just how it behaves in
 an isolated environment under controlled conditions,
 but not how the plant will behave in the real world. 
Biologists don’t just study models in the lab, but in the
 
 wild also. 


Observational Studies 
 

–  Ethnography 
–  Social Technical Design 
–  Iterative Design 
–  Diary Studies 
–  Longitudinal Studies with single outcome 
Problems: 
 

–  Sampling from non‐normal distributions 
–  Treat variations in social contexts as ‘noise’ 
–  What about adoption? 
Mixed Methods is now popular to partially ameliorate 
 

–  Convergent measures tells us we’re getting closer to the truth 


–  similar to Venture Business mentioned by David Millen 
Conduct research on real platforms and services 
 

–  Not to replace controlled lab studies 
–  Expand our arsenal to cover new situations 
Some principles: 
 

–  Embedded in the real world 
–  Ecologically valid situations 
–  Embrace the complexity 
–  Rely on big‐data‐science to extract patterns 
Not ﬁrst to suggest this: 
 
–  S. Carter, J. Mankoﬀ, S. Klemmer and T. Matthews. Exiting the cleanroom: On 
ecological validity and ubiquitous computing. HCI Journal, 2008 
–  EClass [Abowd], PlaceLab [Intille], Plasma Poster [Churchill and Nelson], Digital 
Family Portrait [Rowan, Mynatt] 


Two dimensions 
 

–  1. Whether the system is under the control of the researcher 
–  2. Whether the study is conducted in the lab or in the wild 

System Control System Not in
Control

Laboratory (1) Build a system, (2) Adopt a system,
study in the Lab study in the Lab

Wild (Real (4) Build a system, (3) Adopt a system,
World) release it, study in study in the Wild
the Wild


Traditional Approach; Numerous examples 
 

Favored by CHI reviewers 
 

Typical situation is the study of some interaction technique 
 

–  Pen input, gestures, perception of some visualized data, reading tasks, 
mobile text input 
Typical measures are quantitative in nature 
 

–  performance in time, performance in accuracy, eyetracking, learning 
measures, user preferences 
Issues: 
 

–  Not always ecologically valid 
–  Hard to take all interactions into account 
–  Often time‐consuming; even though we thought we could do it fast. 


Harder to ﬁnd in the literature 
 

Often comparing against an older system as baseline 
 

Typical case is comparison of two systems  
 

–  (one website with another, one word processor vs. another) 
–  Which highlighting feature works better 
–  Two text input technique on a cell phone 
Typical measures are similar to (1) 
 

Issues: 
 

–  Some similar issues to (1) because it’s in lab 
–  System feature not in control, so not able to compare fairly, or 
isolate the feature 


http://www.flickr.com/photos/
crush_images/3245220458/

Real applications in ecological valid situations 
 

Real ﬁndings can be applied to a running system 
 

Impact of research is more immediate, since system is already 
 

running 
Typical case is log analytics with large subject pools 
 

–  log studies of web sites, real mobile calling usages, web search logs, 
studies of Wikipedia edits. 
Typical measures are stickiness, amount of activity, clustering 
 

analysis, correlational analysis 
Issues: 
 

–  Factors not in control, ﬁndings not comparable 
–  Factors cannot be isolated 
–  Reasons for failure is often just guesswork 


Hypothesis: Conﬂict is what drives Wikipedia forward. 
 

How to study this? 
 

–  Tukey paradigm 
–  Get a large paper, and plot the damn data! 

–  Downloaded all of Wikipedia and all of the revisions 
–  Hadoop/MapReduce, MySQL, etc. 


100%

Maintenance
95%

90%
Percentage of total edits

'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to 'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction

Similar to 'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction (20)

More from Ed Chi

More from Ed Chi (20)

Recently uploaded

Recently uploaded (20)

'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction

Editor's Notes