A 1-hour introductory lecture on multimodal interaction that I gave to bachelor HCI students. Included a section on how to get started in this exciting line of research.
Multimodal Interaction!
An Introduction!
Abdallah
‘Abdo’
El
Ali
h"p://staff.science.uva.nl/~elali/
Some slides adapted from:
Gabriel Skantze (KTH Royal Institute of Technology, Sweden),
Denis Lalanne (University of Fribourg, Switzerland)
Who am I?!
Currently:
PhD
in
Mobile
Human-‐Computer
Interac<on
-‐UvA
Crossmodal
Interac=on
in
Mobile
Environments
Msc
in
Cogni<ve
Science
-‐
UvA
Cogni=on,
Language,
&
Communica=on
track
Bsc
in
English
Language
&
Literature
-‐
American
University
of
Beirut
Screenwri=ng,
Copywri=ng,
Edi=ng
2
Outline!
I. Mul=modal
Interac=on
&
Interfaces
II. Mul=modal
Input
III. Mul=modal
Output
IV. Prac=cal
Ma"ers
3
A Brief History of Computer Interfaces!
Punched
cards
(late
19th
century)
Herman
Hollerith
-‐
Tabula=ng
Machine
Company
(1896)
The
Command
Line
Interface
(1960s)
Sketchpad
(1963)
by
Ivan
Sutherland
–
light-‐pen
pointer-‐based
system
to
create
and
manipulate
objects
in
drawings
Alto
personal
computer
(1973)
developed
at
Xerox
PARC
Desktop
metaphor,
WIMP
(windows,
icons,
menus,
poin=ng
device)
WYSIWYG
Xerox
8010
Star
Informa=on
System
(1981)
Apple
Macintosh
(1984)
Windows
1.01
(1987)
Microsoc
Windows
3.0
(1990)
Mac
OSX
(2000’s)
[…]
5
Project NATAL for Xbox 360
Playstation EyePet
7 Kinect for Xbox 360 Playstation Move
HCI and Human Characteristics !
HCI
is
a
mul=-‐disciplinary
topic
Computer
Science
&
AI
Cogni=ve
Science
Sociology
Psychology
Design
[…]
In
HCI
design,
important
to
understand
something
about
Human
informa=on-‐processing
(cogni=ve
architecture,
memory,
percep=on,
motor
skills,
etc.)
How
human
ac=on
is
structured
The
nature
of
human
communica=on
Human
physical
and
physiological
requirements/constraints
8
Why HCI?!
Humans
are
limited
in
their
capacity
to
process
informa=on
Implica=ons
for
the
interac=on
design
Mul=tasking
says
it
all
Important
considera=ons
Input-‐output
channels
(senses
and
effectors)
Memory
Learning
(acquiring
skills)
Reasoning
/
Problem
solving
(cogni=ve
ac=vity)
Decision
making
9
Use Case: Mobile Interaction!
Dis=nc=ve
aspects
of
mobile
interac=on
(Chi"aro,
2010):
Hardware:
small
screen,
limited
I/O
Perceptual:
noisy
street,
sunlight
reflec=on,
no
device
contact
Motor:
voluntary
movements
when
in-‐
vehicle,
fat-‐finger
problem
Social:
phone
ring
at
a
conference,
gestures
in
front
of
strangers
Cogni<ve:
limited
a"en=on
span,
high
stress
&
load,
limited
memory
10
Embodiment!
Embodied
cogni=on,
Situated
Cogni=on,
Embodied
Interac=on,
EEC,
Social
Compu=ng,
Tangible
Compu=ng,
Ac=ve
percep=on,
[…]
Gibson
(1979)
“ The
Ecological
Approach
to
Visual
Percep=on”
“....perceiving
is
an
act
not
a
response,
an
act
of
a"en=on,
not
a
triggered
impression,
an
achievement,
not
a
reflex”
Heidegger
(1927)
“Being
and
Time”
Present-‐at-‐hand
vs.
ready-‐to-‐hand
e.g.,
hammer
as
object
(presence)
vs.
hammer
as
tool
(cogni=ve
extension)
E.g.,
mouse
as
hardware
vs.
mouse
as
tool
for
performing
GUI
opera=ons
Dourish
(1999)
“Founda=ons
of
Embodied
Interac=on”
“…interac=on
is
an
embodied
phenomenon.
It
happens
in
the
world,
and
that
world
(a
physical
world
and
a
social
world)
lends
form,
substance
and
meaning
to
the
interac=on.
Sensori-‐motor
coordina=on
Percep=on
for
ac=on
Agent
Ac=on
for
percep=on
World
Sensation & Perception!
Humans
perceive
the
world
through
their
senses
(sensory
input)
and
act
on
it
through
the
motor
control
of
their
effectors
Five
major
senses
Sight
Hearing
Touch
Taste
Smell
(Propriocep=on,
thermocep=on,
nociocep=on,
…)
Effectors
Limbs
(arms,
legs,
body
posi=on,
…)
Fingers
Eyes
Head
/
Face
Body
Vocal
system
12
Man-Machine Interaction!
Interac<on
can
be
seen
as
a
dialog
between
the
computer
and
the
user
Interac=on
styles
:
Command
language
/
Command
line
interface
Form-‐fills
and
spreadsheets
Menus
Natural
language
and
query
language
Ques=on/answer
dialog
WIMP
Point-‐and-‐click
Direct
manipula=on
3D
interfaces
(virtual
reality)
Brain-‐computer
interface
13
Multimodal Interfaces!
Mul<modal
Interac<on:
the
situa=on
where
the
user
is
provided
with
mul=ple
modes
for
interac=ng
with
a
system
Mul<modal
Interfaces
“…process
two
or
more
combined
user
input
modes
(such
as
speech,
pen,
touch,
manual
gesture,
gaze,
and
head
and
body
movements)
in
a
coordinated
manner
with
mul=media
system
output.
They
are
a
new
class
of
interfaces
that
aim
to
recognize
naturally
occurring
forms
of
human
language
and
behavior,
and
which
incorporate
one
or
more
recogni=on-‐
based
technologies
(e.g.
speech,
pen,
vision)”
(Ovia"
et
al.,
2002)
14
Multimodality vs. Multimedia!
Modality
“refers
to
the
type
of
communica=on
channel
used
to
convey
or
acquire
informa=on.
It
also
covers
the
way
an
idea
is
expressed
or
perceived,
or
the
manner
an
ac=on
is
performed”
(Nigay
&
Coutaz,
1993)
Visual,
Auditory,
Hap=c,
etc.
Mul=-‐
refers
to
2
or
more
such
modali=es
used
Mode
“refers
to
a
state
that
determines
the
way
informa=on
is
interpreted
to
extract
or
convey
meaning”
(Nigay
&
Coutaz,
1993)
Mul<media
“focuses
on
the
medium
or
technology
rather
than
the
applica0on
or
user”
(Buxton,
1986)
e.g.,
sound
clip
a"ached
to
a
presenta=on
Media
channels:
Text,
graphics,
anima=on,
video,
etc.
15
Early Example!
“Put
That
There”
system
(Bolt,
1980)
Speech
and
gestures
used
simultaneously
16
Why Multimodal Interaction?!
Advantages
over
GUI
and
unimodal
systems:
Natural/realism:
making
use
of
more
(appropriate)
senses
New
ways
of
interac=ng
Flexible:
different
modali=es
excel
at
different
tasks
Wearable
computers
and
small
devices
e.g.,
keyboard
typing
devices
require
training
Helps
the
visually/physically
impaired
Faster,
more
efficient,
higher
informa=on
processing
bandwidth
Robust:
mutual
disambigua=on
of
recogni=on
errors
Mul=modal
interfaces
are
more
engaging
17
Why Multimodal Interaction?!
Human
–
Human
protocols
Ini0a0ng
conversa0on,
turn-‐taking,
interrup0ng,
direc0ng
a:en0on,
…
Human
–
Computer
protocols
Shell
interac0on,
drag-‐and-‐
drop,
dialog
boxes,
…
Use more of users’ senses
Users perceive multiple things at once
Users do multiple things at once (e.g., speak and use
hand gestures, body position, orientation, and gaze)
18
Multimodal Input Overview!
Mul=modal
Input:
allows
humans
to
communicate
naturally
provides
user
with
mul=ple
input
modali=es
permits
mul=ple
styles
of
interac=on
may
be
simultaneous
or
not
must
consider
modality
fusion
and
temporal
constraints
21
Speech and Gesture Interaction!
Speech
User
sa=sfac=on
is
highly
dependant
on
their
profiles
and
tasks
The
learning
rate
is
fast
Error
handling
is
getng
be"er
Perceptual
&
social
usage
constraints
are
important
(ambient
noise,
confiden=ality,
disturbance,
etc.)
Good
spoken
languages:
short
sentences
with
prosody
clearly
demarca=ng
end
of
words
Gesture
Habits
are
inherited
from
the
usage
of
mouse
Gesture
poin=ng
is
direct
and
reliable
(deixis)
Gesture
signs
may
not
be
natural
making
recogni=on
hard
23
Fundamental Problems !
Aligning
HCI
tasks
with
modali<es
(and
vice
versa)
Aligning
mul=modal
usage
to
user
profiles
(and
vice
versa)
Mul<modal
Fusion
the
integra=on
of
communica=on
modali=es
in
interac=ve
systems
Input
Mul<modal
Fission
the
repar==oning
of
informa=on
among
several
communica=on
modali=es
Output
24
MATCH: Multimodal Access to City Help
(Johnston et al., 2002)!
Interac=ve
city
guide
and
naviga=on
applica=on:
provides
restaurant
and
subway
informa=on
for
NY
and
DC
Dynamic
map-‐based
interface
on
tablet
Input
modali=es:
Speech,
pen
gesture,
handwri=ng,
GUI
Commands
can
be
speech,
pen,
or
mul=modal
Visual
parsing
of
complex
gestural
input
Output
modali=es:
Coordinated
mul=modal
output
combining
synthe=c
speech
and
dynamic
graphics
Example:
Speech:
“show
inexpensive
italian
places
in
chelsea”
Mul=modal:
“cheap
italian
places
in
this
area”
(pen
gesture;
right)
28
NUMACK (Foster and White, 2005)!
NUMACK
(Northwestern
University
Mul=modal
Autonomous
Conversa=onal
Kiosk)
Embodied
Conversa=on
Agent
(ECA)
that
gives
direc=ons
around
Northwestern's
Campus
Combina=on
of
speech,
gestures
and
facial
expressions
Uses
a
grammar-‐based,
computa=onal
model
of
language
and
gesture
planning
system
NUMACK's
verbal,
non-‐verbal
and
mul=modal
behaviors
realized
through
synthesized
speech
and
kinema=c
body
model
System
updates
its
model
of
context
and
the
world
by
fusing
mul=modal
user
input
Stereoscopic,
head-‐tracking
system
Speech
Pen
29
Multimodal Input Advantages!
Improved
error
handling
&
efficiency
fewer
errors
faster
task
comple=on
Greater
expressive
power
Greater
precision
in
visual-‐spa=al
tasks
(e.g.,
map
scrolling
&
item
localiza=on)
Support
for
users’
preferred
interac=on
style
Accommoda=on
to
diverse
users,
tasks
&
usage
environments
e.g.,
accented
speakers
&
mobile
environments
Shorter
&
less
complex
linguis=c
construc=ons
e.g.,
fewer
loca=ve
descrip=ons
30
Multimodal Output!
Advantages
(Sarter,
2006;
Ovia",
2002):
Synergy
Redundancy
Higher
Informa=on
bandwidth
Wicken’s
Mul=ple
Resource
Theory
(1984)
More
modali=es
=
be"er?
Higher
resource
compe==on
when
people
have
to
a"end
to
two
sources
at
once
(Reeves
et
al.,
2004).
34
Mobile Multimodal Interfaces!
Mobile
context
means
a"en=onal
and
memory
resources
are
limited
(Tamminen
et
al.,
2004)
E.g.,
map
scrolling,
talking
with
friend,
crossing
the
street
Poten=al
of
mul=modal
feedback
cues
in:
1. addressing
issues
of
accessibility
(e.g.,
to
support
blind
users
in
naviga=on)
(Magnusson
et
al.,
2009)
2. developing
pedestrian
naviga=on
aids
to
support
situa=onal
impairment
and
awareness
(Brewster
et
al.,
2003)
Examples:
http://www.lalyagaye.com/
Pocket
Navigator
(Pielot
et
al,
2010)
http://feelspace.cogsci.uni-osnabrueck.de/
AudioGPS
(Holland
et
al.,
2002)
35
Tactile and Non-Speech Auditory Feedback!
Tactons:
“Structured,
abstract
messages
that
can
be
used
to
communicate
non-‐
visually”
(Brown,
2005).
Informa=on
encoded
in
parameters
such
as:
Waveform,
dura=on,
rhythm,
spa=al
loca=on,
frequency,
[…]
Earcons:
“Non-‐verbal
audio
messages
that
are
used
in
the
computer/user
interface
to
provide
informa0on
to
the
user
about
some
computer
object,
opera0on
or
interac0on"
(Bla"ner,
1989).
Informa=on
encoded
in:
Pitch,
amplitude,
dura=on,
spa=al
loca=on,
[…]
Amodal
parameters:
consist
of
informa=on
that
is
not
specific
to
any
one
sensory
modality
(Lewkowickz,
1994).
Parameters
common
to
both
tac=le
and
auditory
domains
(Lewkowickz,
1994;
Hoggan
et
al.,
2009):
Spa=al
loca=on,
rhythm,
texture,
dura=on,
frequency,
intensity/amplitude
36
Crossmodal Interaction!
Subset
of
mul=modal
interac=on
where
the
senses
receive
the
‘same’
informa=on
content
across
invoked
sensory
modali=es
(Gibson,
1966;
Lewkowicz,1994)
Cf.,
Sensory
Subs=tu=on
(Visell,
2008)
vOICe:
Seeing
with
Sound
applica=on;
Braille
Crossmodal
Interac=on
refers
to
situa=ons
where
characteris=cs
of
one
sensory
modality
may
be
bi-‐direc=onally
transformed
into
the
characteris=cs
of
another
(e.g.,
audio
⇿
tac=le)
(Hoggan,
2007;
2009)
Redundancy
37
Crossmodal Output Advantages!
Crossmodal
output
advantages:
Unlike
mul=modal
interac=on,
li"le
risk
of
informa=on
processing
overload
When
one
sensory
modality
is
knocked
out
(e.g.,
noise
environment,
body
contact),
informa=on
is
s=ll
received
Permits
both
‘eyes-‐free’
and
‘hands-‐free’
interac=on
38
International Communities!
CHI:
ACM
CHI
Conference
on
Human
Factors
in
Compu=ng
Systems
MobileHCI:
ACM
conference
on
Human-‐computer
interac=on
with
mobile
devices
and
services
ICMI:
ACM
Interna=onal
Conference
on
Mul=modal
Interac=on
CSCW:
ACM
Conference
on
Computer
Supported
Coopera=ve
Work
ACM
MM:
ACM
Mul=media
Conference
INTERACT:
IFIP
conference
on
Human-‐Computer
Interac=on
WHC:
World
Hap=cs
Conference
43
Resources!
Books:
Paul
Dourish
(2004)
“Where
the
Ac=on
is:
The
founda=ons
of
embodied
interac=on”
Andy
Clark
(2003)
“Natural-‐Born
Cyborgs:
Minds,
Technologies,
and
the
Future
of
Human
Intelligence”
Bill
Buxton
(2007)
“Sketching
User
Experiences:
Getng
the
design
right
and
the
right
design”
Adam
Greenfield
(2006)
“Everyware:
The
dawning
age
of
ubiquitous
compu=ng”
Ar<cles:
Mark
Weiser
(1991)
“ The
Computer
for
the
21st
Century”,
Scien0fic
American
Sharon
Ovia"
(2002)
“Perceptual
user
interfaces:
mul=modal
interfaces
that
process
what
comes
naturally”,
Communica=ons
of
the
ACM
Sharon
Ovia"
(1999)
“ Ten
myths
of
mul=modal
interac=on”,
Communica=ons
of
the
ACM
Nadine
Sarter
(2006)
“Mul=modal
informa=on
presenta=on:
Design
guidance
and
research
challenges”,
Interna=onal
Journal
of
Industrial
Ergonomics
Leah
Reeves
et
al.
(2004)
“Guidelines
for
mul=modal
user
interface
design”,
Communica=ons
of
the
ACM
44
Summary!
We
are
embodied
and
embedded
creatures,
and
this
influences
the
way
we
interact
with
the
world
and
computa=onal
ar=facts
Mul<modal
Interfaces
aim
at
making
communica=on
with
machines
more
natural,
more
efficient,
and
more
engaging
Mul<modal
Input
and
Output
focus
on
different
aspects
within
HCI,
requiring
different
skill
sets,
but
mul=modal
research
and
development
requires
both
Mul<modal
Interac<on
is
an
exci=ng
and
rapidly
growing
area
that
hugely
benefits
from
HCI
work
45
Contact!
Abdo
El
Ali
e:
elali@uva.nl
w:
h"p://staff.science.uva.nl/~elali/
t:
+31
(0)20
525
8661
Address:
Room
C3.258,
Informa=cs
Ins=tute,
Science
Park
904,
1098
XH
Amsterdam,
NL
47 Slides
available
at:
h"p://staff.science.uva.nl/~elali/hci_abdo_2011.pdf
References (1)!
Bla"ner,
M.
M.,
Sumikawa,
D.
A.,
&
Greenberg,
R.
M.
(1989).
Earcons
and
icons:
Their
structure
and
common
design
principles.
Human-‐Computer
Interac=on,
4,
1,
11-‐44
Bolt.,
R.
A.
(1980).
“Put-‐that-‐there”:
Voice
and
gesture
at
the
graphics
interface.
SIGGRAPH
Comput.
Graph.
14,
3,
262-‐270.
Brown,
L.
M.,
Brewster,
S.
A.
and
Purchase,
H.
C.
(2005).
A
First
Inves=ga=on
into
the
Effec=veness
of
Tactons.
In
Proceedings
of
the
First
Joint
Eurohap=cs
Conference
and
Symposium
on
Hap=c
Interfaces
for
Virtual
Environment
and
Teleoperator
Systems
(WHC
'05).
IEEE
Computer
Society,
Washington,
DC,
USA,
167-‐176.
Brewster,
S.,
Lumsden,
J.,
Bell,
M.,
Hall,
M.,
and
Tasker,
S.
(2003.)
Mul=modal
'eyes-‐free’
interac=on
techniques
for
wearable
devices.
In
Proc.
of
CHI
'03.
ACM
Press,
New
York,
NY.
Buxton,
W.
(1986)
There's
More
to
Interac=on
than
Meets
the
Eye:
Some
Issues
in
Manual
Input.
In
Norman,
D.
A.
and
Draper,
S.
W.
(Eds.),
(1986),
User
Centered
System
Design:
New
Perspec=ves
on
Human-‐Computer
Interac=on.
Lawrence
Erlbaum
Associates,
Hillsdale,
New
Jersey,
pp.
319-‐337.
Chi"aro,
L.
(2009).
Dis=nc=ve
aspects
of
mobile
interac=on
and
their
implica=ons
for
the
design
of
mul=modal
interfaces.
Journal
on
Mul=modal
User
Interfaces,
3(3),
157-‐165.
Dourish,
P.
(2000).
Embodied
Interac=on:
Exploring
the
Founda=ons
of
a
New
Approach
to
HCI.
Transac=ons
on
Computer-‐Human
Interac=on.
Dumas,
B.,
Lalanne,
D.
and
Ovia",
S.
(2009).
Mul=modal
Interfaces:
A
Survey
of
Principles,
Models
and
Frameworks.
In
Human
Machine
Interac=on,
Denis
Lalanne
and
Jorg
Kohlas
(Eds.).
Lecture
Notes
In
Computer
Science,
Vol.
5440.
Springer-‐Verlag,
Berlin,
Heidelberg
3-‐26.
Gibson,
J.
J.
(1966).
The
Senses
Considered
as
Perceptual
Systems.
Houghton
Mifflin,
Boston.
Gibson,
J.
J.
(1979).
The
Ecological
Approach
to
Visual
Percep=on.
Houghton
Mifflin,
Boston.
Heidegger,
M.
(1927).
Being
and
Time.
Trans.
by
John
Macquarrie
&
Edward
Robinson,
London:
SCM
Press,
1962).
Hoggan,
E.
and
Brewster,
S.A.
(2007)
Designing
Audio
and
Tac=le
Crossmodal
Icons
for
Mobile
Devices.
In
ACM
Interna=onal
Conference
on
Mul=modal
Interfaces
(Nagoya,
Japan).
ACM
Press,
pp
162-‐169
48
References (2)!
Hoggan,
E.,
Raisamo,
R.
and
Brewster,
S.A
(2009).
Mapping
Informa=on
to
Audio
and
Tac=le
Icons.
In
Proceedings
of
ACM
ICMI
2009
(Cambridge,
MA,
USA).
ACM
Press,
pp
327-‐334
Holland,
S.,
Morse,
D.
R.,
and
Gedenryd,
H.
(2002).
AudioGPS:
Spa=al
audio
naviga=on
with
a
minimal
a"en=on
interface.
Personal
Ubiquitous
Comput.,
6(4):253–259,
2002
Kopp,
S.,
Tepper,
P.
and
Cassell,
J.
(2004).
"Towards
Integrated
Microplanning
of
Language
and
Iconic
Gesture
for
Mul=modal
Output.“
ICMI
2004.
Lewkowicz,
D.
J.
(1994).
Development
of
intersensory
percep=on
in
human
infants.
In
Lewkowicz,
D.
J.
&
Lickliter,
R.
(Eds.).
Development
of
Intersensory
Percep=on:
Compara=ve
Perspec=ves,
Norwood,
N.J.:
Lawrence
Erlbaum
Associates
Magnusson,
C.,
Tollmar,
K.,
Brewster,
S.,
Sarjakoski,
T.,
Sarjakoski,
T.,
&
Roselier,
S.
(2009).
Exploring
future
challenges
for
hap=c,
audio
and
visual
interfaces
for
mobile
maps
and
loca=on
based
services.
In
Proceedings
of
the
2nd
interna=onal
workshop
on
loca=on
and
the
web
(pp.
8:1{8:4).
New
York,
NY,
USA:
ACM.
Nigay,
L.
and
Coutaz,
J.
(1993).
A
design
space
for
mul=modal
systems:
concurrent
processing
and
data
fusion.
In
Proceedings
of
the
INTERACT
'93
and
CHI
'93
conference
on
Human
factors
in
compu=ng
systems
(CHI
'93).
ACM,
New
York,
NY,
USA,
172-‐178.
Pielot,
M.,
Krull,
O.
and
Boll,
S.
(2010b).
Where
is
my
team:
suppor=ng
situa=on
awareness
with
tac=le
displays.
In
Proceedings
of
the
28th
interna0onal
conference
on
Human
factors
in
compu0ng
systems
(CHI
'10).
ACM,
New
York,
NY,
USA,
1705-‐1714.
Pielot,
M,
Poppinga,
B.,
and
Boll,
S.
(2010).
PocketNavigator:
Vibro-‐Tac=le
Waypoint
Naviga=on
for
Everyday
Mobile
Devices,
Mobile
HCI
2010,
Lisboa,
Portugal.
Reeves,
L.
M.,
KLai,
J.,
Larson,
J.
A.,
Ovia",
S.,
Balaji,
T.
S.,
Buisine,
S.,
Collings,P.,
Kraal,
B.,
Mar=n,
J.
C.,
McTear,
M.,
Raman,
T.
V.,
Stanney,
K.
M.,
Su,
H.,
and
Wang,
Q.
Y.
Guidelines
for
Mul=modal
User
Interface
Design.
Commun.
ACM
47(1)(2004),
57
–
59.
Visell.
Y.
(2009).
Tac=le
sensory
subs=tu=on:
Models
for
enac=on
in
HCI.
Interact.
Comput.
21,
1-‐2,
p.38-‐53.
49