Getting one voice:
tuning up experts’ assessment in
measuring accessibility
Silvia Mirri
Ludovico A. Muratori
Paola Salomoni
Matteo Battistelli
Department of Computer Science
University of Bologna
Summary
Introduction
Automatic and manual accessibility evaluations
Our proposed metric
Conclusions and future works
W4A 2012 – April 16th&17th, 2012 - Lyon, France 2
Introduction
Web accessibility evaluations
automatic tools + human assessment
Metrics quantify accessibility level or barriers, providing
numerical synthesis
• automatic tools return binary values
• human assessments are subjective and can get values from a
continuous range
W4A 2012 – April 16th&17th, 2012 - Lyon, France 3
Our main goal
Providing a metric to measure how far a Web
page is from its accessibility version, taking into
account
• integration of human assessments with automatic
evaluations on the same target
• many humans assessments
W4A 2012 – April 16th&17th, 2012 - Lyon, France 4
Steps
1. Mixing up the manual evaluation together with the
automatic ones
2. Combining the assessments coming from different
human evaluations
• Values distributed into a given range
• The more experts' assessments contribute to compute a
value, the more this value is stable and reliable
W4A 2012 – April 16th&17th, 2012 - Lyon, France 5
Automatic and manual evaluations: an example
Combination between the IMG element and its ALT
attribute:
1. If the ALT attribute is omitted the automatic check outputs 1
2. If the ALT attribute is present the automatic check outputs 0
Manual evaluation might state that:
• there is no lack of information once the images are hidden (this
can happen in case 1, if the image is a pure decorative one)
• there is a lack of information once the image is hidden
W4A 2012 – April 16th&17th, 2012 - Lyon, France 6
Our metric
• A first version of our metric (Barriers Impact Factor) is
computed on the basis of a barrier-error association
table
• This table reports the list of assistive
technologies/disabilities affected by any error
• screen reader/blindness
• screen magnifier/low vision
• color blindness
• input device independence/movement impairments
• deafness
• cognitive disabilities
• photosensitive epilepsy
W4A 2012 – April 16th&17th, 2012 - Lyon, France 7
Our metric
• Comparing automatic checks with WCAG 2.0 success
criteria and identified relationships
a certain error occurs or a
A check fails
manual control is necessary
• Each barrier is related to one success criterion and to
one level of conformity (A, AA or AAA)
• Manual evaluations take values on the [0, 1] real
numbers interval:
• 1 means that an accessibility error occurs
• 0 means the absence of that accessibility error
W4A 2012 – April 16th&17th, 2012 - Lyon, France 8
Weighting automatic and manual checks
1. m(i)=a(i): the formula is a mere average among automatically
and manually detected errors
2. m(i)>a(i): the failure in manual assessment is considered more
significant than the automatic one
3. m(i)<a(i): the failure in automatic assessment is considered
more significant than the manual one
AUTOMATIC AUTOMATIC
0 1 0 1
[0, I III [0, I II
MANUAL
MANUAL
,1] II IV ,1] III IV
W4A 2012 – April 16th&17th, 2012 - Lyon, France 10
Some considerations
• The more human operators provide evaluations about
an accessibility barrier and the more the value of
accessibility level is reliable
• Behavior similar to online rating systems ones
• New users rating can be influenced by already
expressed evaluations from other users
• Variance must be considered so as to reinforce the
computed accessibility level
W4A 2012 – April 16th&17th, 2012 - Lyon, France 11
A first assessment
PAGE CONTENT MANUAL EVALUATIONS
0,7 Expert A
1 Expert B
0,8 Expert C
1 Expert D
ALT=“Image” 0,5 Expert E
NO LINK, NO TITLE
CBIF
AUTOMATIC EVALUATION
m=2
a=1
0 (no known errors, Average=0,8 CBIF=0,53
1 alert: placeholder Variance=0,036
detected)
W4A 2012 – April 16th&17th, 2012 - Lyon, France 12
Conclusions
• We have defined an accessibility metric with the aim to
evaluate barriers as a whole, combining results
provided by using automatic tools and manual
evaluations done by experts
• The metric has been preliminary tested by measuring
accessibility barriers in several local public
administration Web sites
• Five experts are manually evaluating barriers related to
WCAG 2.0 1.1.1 (using an automatic monitoring system
to verify the page content and to collect data from
manual evaluations)
W4A 2012 – April 16th&17th, 2012 - Lyon, France 13
Future Work
• Propose and discuss weights for the whole WCAG 2.0
set of barriers
• Investigate how the number of experts involved in the
evaluation, together with their rating variance, could
influence the reliability of the computed values
W4A 2012 – April 16th&17th, 2012 - Lyon, France 14
Contacts
Thank you for your attention!
For further information:
silvia.mirri@unibo.it
W4A 2012 – April 16th&17th, 2012 - Lyon, France 15