Multisensory cues can facilitate or impair driving performance depending on their congruency. The document proposes an experiment to test this using a lane change test. It involves presenting visual lane change cues with concurrent auditory cues varying in spatial, temporal and semantic congruency. Response times will be measured to see how congruent and incongruent multisensory cues impact driving performance compared to visual-only cues. The results could help understand how to best design in-vehicle multimodal displays.
1. Background and Intro
Multisensory Cue Congruency in
Lane Change Test
Yuanjing Sun
Advisor: Myounghoon Jeon
Dec 4th ,2015
Mobile internet extends interaction with web applications to driver’s seat. Multitasking while
driving becomes an inevitable challenge to not only drivers, but also the whole automobile
industry.
Even though 14 states prohibit drivers from using hand-held cell phones while driving, a large
portion of injury crashes has been defined as distraction-related crashes (Nhtsa.gov, 2015). So,
many institutes and organizations have tried to develop well-designed in-vehicle technologies to
support driving tasks.
1.1 Background
1/18/2016 3
IVIS assistance should not cause overload!!
Wickens’ Multiple Resource Theory
• Extends information receiving channels beyond
vision
Reliability issues of research protocols
• Various research protocols and simulators
• Adaptive Integrated Driver-vehicle interface” (AIDE)
(Engström et al., 2004)
• a U.S. project, “SAfety VEhicle using adaptive Interface
Technology” (SAVE-IT)
2. Multiple resource theory (MRT) describes how information travels through different channels in
our brain depending on its modalities. (Wickens, 2008). MRT suggested a way to provide as much
information to drivers as workload permits. Well-designed multimodal interface can extend
information-receiving channels beyond vision. For example, speech recognition and seat
vibrotactile notifications tend to become popular in recent car generations. However, we need to
identify the strengths and weaknesses of multimodal interfaces.
There have been efforts to construct methodologies to measure driving distraction from
interaction with IVIS. But previous studies vary in experimental settings in terms of different
research protocols and different simulators. So, there are some reliability issues.
1.2 Why Use the Lane-Change Test (LCT)?
• The Lane Change Test (ISO26022) is a simple laboratory dual task
method that quantitatively measures performance impairment in a
primary driving task (Mattes,2003).
• Economical, low-fidelity driving simulator
• Previous data provides high validity
• Discrete event under continuous visual task
• Decompose driving performance into event detection (RT) and lateral
maneuver (lane deviation), which provides a speed-accuracy tradeoff
1/18/2016 4/30
The Lane Change Test compensates for these issues because it is a standardized methodology to
measure distraction. It is an easy-to-use laboratory method that can quantify the distraction by
measuring the performance impairment of the primary driving task.
3. 1.3 Motivation: How to test if IVISs facilitate or
distract drivers?
•How will spatially or temporally incongruent audio-
visual cues impact driving performance?
• Auditory Spatial Stroop (Baldwin, 2012) experiment
to measure the variance of driving performance under
multimodal cue combinations.
1/18/2016 5/30
Although IVIS provides driving related information, it still occupies part of attentional resources.
It is hard to tell whether this so-called assistance will facilitate or distract driving performance
because there is still a gap in evaluation of IVIS utility. Thus, I propose an Auditory Spatial Stroop
experiment to investigate how driving performance varies under different multimodal cue
combinations.
The Auditory Spatial Stroop experiment investigates whether the location or the meaning of the
stimuli more strongly influences performance when they conflict with each other. For example,
the word “LEFT” or “RIGHT” is presented in a corresponding or opposite position from its meaning.
It simulates the complex driving environment: For example, your navigation device tell you to turn
right but the collision avoidance system warns you that some hazard is coming from right.
Literature Review
The information processing framework can be divided into various sub-processes depending on
different perspectives. Both multimodal/crossmodal facilitation and inhibition have been studied
with different theories and mechanisms. I will review models and theories that are closely related
to this proposal. However, note that those are not intended to be exhaustive.
4. 2.1 Hierarchical Driving Behavior Model
• Michon’s hierarchy model (1985)
1/18/2016 7/30
The strategic level e.g., plan route according to traffic
information.
The maneuver level e.g., negotiating curves,
intersections, performing lane change maneuvers and
overtaking, and obstacle avoidance.
The operational level e.g., brake or shifting two choice
single tasks. They are more of a reflexive car control.
Because driving is such a complicated task, Michon divided driving into the three levels. From
highest to lowest are the strategic level, the maneuver level, and the operational level.
The strategic level involves general trip planning, including selecting and evaluating the cost and
risk associated with alternative trips.
The maneuver level involves negotiation of common driving situations (e.g., negotiating curves,
intersections, gap acceptance in overtaking or entering the traffic stream, performing lane change
maneuvers, and obstacle avoidance).
The lowest operational level involves a single task, such as braking, shifting, etc.,.
The present proposal focuses on the maneuver level behavior because it requires perceptually
processed signals, and the integration of visual and spatial information in the driving
environment. The impact of multimodal representation in design of IVIS will be directly
reflected in the maneuver level performance.
2.2 Wickens’ Multiple Resource Theory (MRT)
1/18/2016 8/30
Figure 1.The three dimensional (cube) structure of the Multiple Resource model
(Wickens et al., 2013).
5. Wickens’ (Wickens et al., 2013) multiple resource theory (MRT) is a model to predict interference
between two concurrently presented signals. It is composed of four dimensions as figure one
shows. The four dimensions are stages, modalities, accesses (i.e. “codes” in earlier version) and
responses.
The MRT suggested that two tasks demanding separate resources along these four dichotomous
dimensions will improve the overall time-sharing performance. And it less impairs either task than
those tasks occupying the same resources. In this big model, my study only focuses on this part,
which is related to verbal and spatial codes, and audio / visual modalities.
Facilitation
• Lip-reading
• Crossmodal synesthesia
Inhibition
• Multisensory illusion
- (e.g., McGurk effect)
1/18/2016 9/30
How will multimodal cues have benefits over unimodal cues?
• Spatial rule
• Temporal rule
Synchrony benefit in synesthesia vs. Asynchrony benefit in Posner
(1973) preparation function
2.3 Rules for Crossmodal Facilitation
However, multimodal time sharing is not always good. Researchers find that crossmodal signals
can both benefit and deteriorate information processing. It can be challenged by inhibition
effects such as multisensory illusions. In audiovisual speech studies, it’s generally believed that
lip-reading can enhance the comprehension of the speech in a noise background. However, the
McGurk effect is an exception.
The McGurk illusion (McGurk & MacDonald 1976) described a phenomenon that a sound of /ba/
tends to be perceived as /da/ when it is paired with a visual lip movement /da/. Incongruent
audiovisual cues might inhibit multimodal processing. Here comes the question, “How will
multimodal cues benefit over unimodal?”
Crossmodal synesthesia described a condition in which a person experiences sensations in one
modality when a second modality is stimulated (Olsheski, 2014). For example, there is one spatial
rule; and one temporal rule (Baldwin, 2012, p. 191) to facilitate crossmodal benefits.
Spatial rule In Spence’s (2010) review of crossmodal spatial attention, the spatial rule was defined
as the RT performance benefit on ipsilateral cues over contralateral cues for visual and auditory
modalities. If audio and visual signal comes from same side, it is more likely to be integrated as
one.
Temporal rule
6. The temporal rule outlines that responses to multimodal cues will benefit from temporal
synchrony between visual and auditory cues because of the maximum overlap. However, the
synchrony benefits may not explain every case. Posner, Klein, Summers, and Buggie (1973) asked
participants to respond with a left or right key to a target either occurring left or right to a fixation.
Their study proposed a ‘‘preparation function.’’ It suggested that the response time will be as a
function of the SOA between the priming tone and the visual target stimulus. 200ms SOA showed
the best performance. (So both synchrony and asynchrony will both benefit multisensory
processing, how can we chose in IVIS design?)
2.4 Temporal Rule (cont'd) : Colavita Bias and
SOA for Corssmodal Facilitation
Colavita bias (minimum for SOA)
• Participants respond more often to the visual component instead of
the auditory one. Vision is dominant modality in audiovisual
perception.
Unity effect (maximum for SOA)
• Cue-target pair can be perceived as one integrated simultaneous
attention or two separate multitasking events.
1/18/2016 10/30
The SOA affects the occurrence and strength of crossmodal binding.Can auditory cues and visual
cues be processed equally?
The Colavita bias is a visual dominance phenomenon where participants respond more often to
the visual cue instead of the auditory cue even when the two stimuli are presented at the same
time. The neglect of the auditory cue might cause a misjudgment of the prior-entry modality that
is against the researcher’s original intention.
To avoid the Colavita bias, we need a minimum SOA. On the other side, if the SOA is too large
then the cue will be perceived as two separate events. If the SOA is appropriate, it will lead to
simultaneous attention, but if SOA is too large, it will become multitasking. My study focuses on
simultaneous attention, instead of multitasking. In particular, participant will sense two cues as
one unified event.
7. 2.5 Type & Demand of Visual Tasks
Type
• Visual scanning (discrete) task vs. Visual tracking (continuous) task
• A meta-analysis compared auditory-visual (AV) with visual-visual (VV)
tasks.
• AV has 15% advantage over VV in average across 29 studies when there is a
discrete task (Wickens et al., 2011).
• Driving is a visual tracking task. But the Lane-Change Test is a discrete task.
So auditory cues will facilitate performance.
Demand
• Perceptual load (frequency of visual targets) and working memory
load (alternative numbers of response)
• Both multisensory facilitation and inhibition can be demonstrated by
changing the task type and visual demand.
1/18/2016 11/30
Perception has a limited capacity but processes stimuli in an automatic and voluntary fashion
until the free capacity is drained out.
Wickens et al. (2013) suggested a crossmodal display may benefit a visual scanning task but not
the continuous visual tracking task. A meta-analysis compared auditory-visual (AV) tasks with
visual-visual (VV) tasks. The results indicated that the auditory presentation offered a 15 percent
advantage (collapsed over both speed and accuracy) over visual only presentation when the task
is discrete.
The level of visual task demand can influences crossmodal facilitation.Sinnett, Soto-Faraco, and
Spence (2008) manipulated perceptual load (frequency of visual targets) and working memory
load (alternative numbers of response) to compare the crossmodel benefit by manipulating these
two variables. Same crossmodal stimuli can either facilitate or inhibit by changing the task type
and visual demand. That’s why I need to control the speed in my experiment
It serves two purposes; 1) It controls the perceptual load. 2) It controls for individual differences
in driving style. Otherwise the facilitation and inhibition will cancel each other out.
Hypothesis
Based on all this background, the proposed study aims to investigate how congruent or
incongruent (temporal, spatial, & semantic) multimodal cues (auditory & visual) influence driving
performance. To test this more, I will use LCT as a surrogate driving task.
8. 3.2 Hypotheses
• H1A: Congruent crossmodal cue-target pairs will have shorter RT than those in
unimodal (visual-only) conditions.
• H1B: Incongruent crossmodal cue-target pairs will have longer RT than unimodal
(visual-only) conditions.
• H1C: Congruent crossmodal pairs will have shorter RT than incongruent crossmodal
pairs.
• H2A: Asynchronous pairs will have shorter RT than unimodal (visual-only) conditions.
• H2B: RT with synchronous crossmodal cues will not be longer than those in unimodal
(visual-only) conditions.
• H3A: When verbal cues are spatially incongruent with visual target, they will delay RT.
• H3B: When verbal cues are spatially incongruent but semantically congruent with
visual target, RT will still be slower than unimodal (visual-only) conditions.
1/18/2016 14/30
Experiment Method
4.1 Participants
• Forty participants (20 male and 20 female) will be recruited from
MTU SONA system.
• 18 year-old native speaker who having driving license more than 2
years
• Equivalent hearing test will be given in a training track before the real
experiment starts.
1/18/2016 16/30
9. 4.2 Stimuli
• Visual cues are composed of an arrow sign and cross sign as above
• Auditory cues are composed of four nonverbal cues and four verbal
cues
1/18/2016 17/30
START
"Start" sign Arrow sign Cross sign
Left right Lef-left Righ-right
Nonverbal cues
Verbal cues
The "Lane Change" signs appear in an overhead position of a gate on the simulated roadway. ("They
are composed of one arrow sign and two cross signs in three separate black borders. There is one “START” sign and one “FINISH”
sign in the beginning and end of each track. )
Two non-verbal stimuli and four verbal stimuli will be used as auditory cues. The nonverbal cues
will be produced with 500Hz 350 msec duration. The nonverbal cues will be presented either from
left or right. It will be either congruent or incongruent depend on the trial.The repeated sound
move two lanes instead of one.(i.e., from left most lane to right most lane or vice versa).
The four verbal cues are “LEFT” “RIGHT”, “LEF-LEFT” and “RIGH-RIGHT”. And the verbal cues will
be produced with the same length and loudness as non-verbal cues. (Chan & Or, 2012). They will
be delived through Headset. And I controlled for cue length and volume.
(The speech clips “LEFT” and “RIGHT” were recorded through free online Text-to-Speech (TTS)
service (Fromtexttospeech.com, 2015) I chose a medium speed with female voice (Laura, US
English) as original speech file. Sped-up speech clips, “LEF-LEFT” and “RIGH-RIGHT” I imported
the original TTS files to Audacity 2.1.0 version and replicated each word to two tracks. For the
first track, the first vowel was reserved, and for the second track the second vowel was reserved.
The last step was to twist the pitch in order to make the duration of two tracks within 350 msec.
Verbal cues have two level of properties: spatial level and semantic level. Thus, the mapping
relationship of verbal cues with visual target have both spatial congruency (physical location of
the cue to visual indication) and semantic congruency (meaning of the cue to visual indication).
For example, when the visual cue indicates change to the left lane, the participant will hear a
verbal cue, “LEFT” coming from the right speaker. This situation counts as semantically congruent
and spatially incongruent condition. )
10. 4.3 Apparatus & Scenario
1/18/2016 18/30The plot of a whole track ----driving trajectory ----- steering wheel angle
18/30
The scenario was developed according to ISO on the basis of OpenDSv2.5. It requires participants
to rapidly change lane as soon as they receive the signal. The measures of performance compared
with the unimodal condition reflect the multimodal perception facilitation or impairment.
It’s straight course with three lanes and 18 lane change signs. Each track is 2min long and there
are around 6 seconds between each lane-change.
1/18/2016 19/30
Visual
only
Unimodal
/Visual-
Only
Spaital
Congruency
78% Spatial
Incongruency
Spatial &
Semantical
Congruency
78% Spatial
Congruency
Semantic
Incongruency
78% Spatial
Incongruency
Semantic
Congruency
78% Spatial
Incongruency
Semantic
Incongruency
Track0 Synchrony Track 8 Track 2 Track 12 Track 5 Track 10 Track3
Track13 Asynchrony Track 4 Track 6 Track 1 Track 11 Track 7 Track9
Nonverbal Cue Verbal Cue
4.4 Experimental Design
• within-subjects variables: 2 timing * 3 modality * 2 congruency
The experiment is a 2 (timing) * 3 (modality) * 2 (congruency) within-subjects design. In timing
wise, I have synchrony vs. asynchrony conditions, indicating the temporal gap between audio-
visual stimuli.
In modality wise, I have visual, verbal and nonverbal cues. Each participant will perform a total of
14 tracks, consisting of two-time control conditions (visual only tracks) and twelve crossmodal
conditions. (four are nonverbal and eight are verbal)
11. 4.4 Experimental Design (cont’d): Counterbalancing
• Table 3. The exposure order of fourteen tracks
1/18/2016 20/30
Order A: 1st Visual Only Track – Audio/Visual Tracks (e.g., 1>2>3>4>5>6) – 2nd Visual Only Track – Audio/Visual Tracks (7>8>9>10)
Order B: 1st Visual Only Track – Audio/Visual Tracks (e.g., 10>9>8>7>6>5) – 2nd Visual Only Track – Audio/Visual Tracks (4>3>2>1)
The exposure order of 14 tracks as table 3 shows Participants will be randomly distributed into
two groups for counterbalancing purpose. This different ordering considers timing and modality
as much as possible. In this way, participants can hardly adapt to the visual and auditory cue
patterns.
4.5 Procedure
1/18/2016 21/30
1. Sign consent
form
2. Adjust driving
seat and watch
instruction video
3. Training track
and equivalent
hearing test
4. Start 1st baseline
(visual-only) track
5. Audio-visual
tracks
(counterbalanced
across participants)
6. 2nd baseline
(visual-only) track
in 7th, 8th,9th,10th
run
7. Complete the
remaining tracks
8. Debrief and
thank
After signing consent form, participants will watch an instruction video about an overview of the
experiment and guidance on how to use driving simulator. The video will show participants how
to quickly and efficiently change lanes when the lane change symbol appears in a training task.
Participants need to complete a training track containing all possible combinations of multimodal
signals, which might appear in the following driving task. Before the real test starts, participants
12. will adjust the seat and have a equivalent hearing test in training track. Four words (LEFT, LEF-
LEFT, RIGHT, RIGH-RIGHT) will be given through headphones at various levels of loudness. 50%
correctness is a pass for that test. The real experiment will start when the participants confirm
that they understand the whole process. A RT histogram will pop out when the participant
finishes each track.
Variable & Metrics
5.1 Criteria & Metrics
• Reaction Time (= Event detection)
• Accuracy & Error Type (= Percent of correct lane)
• Lateral Control (= Mean deviation)
1/18/2016 23/30
Speed
accuracy
tradeoff
5.2 Reaction Time
• Reaction Timer starts when
the first cues appear.
• Reaction Timer ends when
the car remains straight in
the targeted lane for two
seconds.
• These two seconds will be
subtracted from the total RT.
1/18/2016 24/30
200
msec
40 meter
Asynchronous
Audio cues
generated
Visual cues
generated
Lateral control
after completing
lane change
13. The maximum RT window for correct completion lane change is either seven seconds or 117
meter after the lane change sign, which has been defaulted in OpenDS Reaction Task settings.
5.3 Accuracy & Error Type
• “Correct LC”: the end position of the
driver is in the attended lane,
• “No LC”: the driver is in the same Li zone
at start and end positions,
• “Erroneous LC”: the end position of the
driver is in another lane than in the
attended one.
• “Loss of Control LC”: the end position of
the driver is in one of the Oi zone,
1/18/2016 25/30
The lateral position was measured relative to the road (and not relative to a specific lane). The
accuracy of lane-change completeness is termed as the percent correct lane (PCL). PCL is defined
by the driver’s position before and after its realization. The visible point of lane change sign is 40
meters ahead of the sign position. So, participants have 110 m to complete lane change and
maintain in the center of the targeted lane which provides a buffer if participants fail in previous
maneuver).
For each road segment between two signs, the lane where the vehicle was most frequently
positioned was identified. Consistent lane choices were then defined as those cases where the
vehicle remained in the lane for more than 75% of the segment. This selected lane was then
compared to the correct target lane. For each track, the Percent Correct Lane was then calculated
as the fraction of the consistent lane choices that were correct.
To determine this position, the 3 lane-road has been divided into different zones, corresponding
to parts of the lanes. The pink zones L1 to L3 correspond to a correct position in lane1 (left lane),
lane2 (center lane) or lane3 (right lane), while the pink zones “O” correspond to out of lane
positions (Figure 5). The lateral position of the driver is defined by the zone which contains the
75% of his/her trajectory between two signs. If not, then the position is considered as being out
of lane and the reaction timer will output an NA instead of RT. The correctness of each lane
change has been defined as follow: 1) “Correct LC”: the end position of the driver is in the
attended lane, 2) “No LC”: the driver is in the same Li zone at start and end positions, 3)
“Erroneous LC”: the end position of the driver is in another lane than in the attended one.
14. 5.4 Lateral Control
1/18/2016 26/30
Lane deviation calculation plot samples.
Red one is the base track. Grey is the deviated area.
Worse performance: large deviationBetter performance: small deviation
Mean deviation (Mdev) comes from the difference between the position of the condition
trajectory over the baseline curve (Simplifed with algorithm in path mapping ). With the Mdev, I
can calculate the lane-change behavior variance between the baseline run and the condition run.
Also, I can obtain the individual differences by comparing every participant’s baseline run with
the optimal curve. Outliers could be excluded if Mdev is larger than 1.2 (Tattegrain et al., 2009).
References
• Michon, J. A. (1985). A critical view of driver behavior models: what do we know, what should we
do? Human behavior and traffic safety (pp. 485-524): Springer.
• ISO, I. (2010). 26022: 2010 Road vehicles–Ergonomic aspects of transport information and control
systems–Simulated lane change test to assess in-vehicle secondary task demand. Norm.
International Organization for Standardization, Geneva, Switzerland, 24.
• Koppen, C., & Spence, C. (2007). Audiovisual asynchrony modulates the Colavita visual dominance
effect. Brain research, 1186, 224-232.
• Tattegrain, H., Bruyas, M.-P., & Karmann, N. (2009). Comparison Between Adaptive and Basic
Model Metrics in Lane Change Test to Assess In-Vehicle Secondary Task Demand. Paper presented
at the PROCEEDINGS OF THE 21ST (ESV) INTERNATIONAL TECHNICAL CONFERENCE ON THE
ENHANCED SAFETY OF VEHICLES, HELD JUNE 2009, STUTTGART, GERMANY.
• Sinnett, S., Soto-Faraco, S., & Spence, C. (2008). The co-occurrence of multisensory competition
and facilitation. Acta Psychol (Amst), 128(1), 153-161.
• Wickens, C. D., Hollands, J. G., Banbury, S., & Parasuraman, R. (2013). Engineering Psychology and
Human Performance: Pearson Education, Limited.
1/18/2016 29/30
Appendix Power Analysis
To compare the effect size of learning effect and condition effect, I compared the 1st visual
MeanRT and the 6th visual MeanRT. The effect size of learning effect (cohen's d) = 0.206. The
15. power to find the learning effect with 4 participant was 0.1. (small = 0. 2, medium = 0.5, large =
0.8 for paired samples t-tests power analysis)
Secondly, I wanted to run a repeated measures ANOVA based on data from the pilot study. With
the same power to find the learning effect, I need 9.1 person per group to find the condition
effect.
aovfour<-aov(reactionTime~1+condition+SpatialCongruency+Subj+SignNo.,data=four3)
Analysis of Variance Table
Response: reactionTime
Df Sum Sq Mean Sq F value Pr(>F)
condition 3 2648816 882939 2.8991 0.03569 *
SpatialCongruency 1 145476 145476 0.4777 0.49014
group 3 86612017 28870672 94.7944 < 2e-16 ***
comment 17 55891159 3287715 10.7949 < 2e-16 ***
Residuals 245 74617407 304561
Accordingly, I use Sum of Square of each factor, divided by Sum Sq total. The effect size of
condition was 0.012. Effect size of spatial congruency was 0.0006. Effect size of individual
difference was 0.394. Effect size of other noise was 0.340. Effect size comes from conditions and
spatial congruency was very small. Most variance comes from individual difference and the sign
order. Then, I use pwr package to calculate the power of condition effect is 0.05.
> pwr.anova.test(k = 4, n = 18, sig.level = 0.05, f=0.01)
Balanced one-way analysis of variance power calculation
k = 4
n = 18
f = 0.01
sig.level = 0.05
power = 0.05039755
But I think the result above is between-subjects ANOVA instead of within-subjects ANOVA. So I
used stats::power.anova.test and recalculated it. With the same power to find the learning effect,
I need 9.1 person per group to find the condition effect.
power.anova.test(groups = 4,
+ between.var = 13425.09,
+ within.var = 420396.5, power = .1)
Balanced one-way analysis of variance power calculation
groups = 4
n = 9.136422
between.var = 13425.09
within.var = 420396.5
sig.level = 0.05
power = 0.1