University of Exeter
College of Engineering,
Mathematics and Physical Sciences
ECM3735 Mathematics Group Project
Computer Assessment - The
Challenges and Potential
Solutions
Authors:
Candidate Numbers
003440, 035429, 006702,
000997, 019169, 008339,
011812, 006667.
Advisor:
Dr. Barrie COOPER
College of Engineering, Mathematics and
Physical Sciences Harrison Building
Streatham Campus
University of Exeter
North Park Road
Exeter
UK
EX4 4QF
Tel: +44 (0)1392 723628
Fax: +44 (0)1392 217965
Email: emps@exeter.ac.uk
December 7, 2015
Abstract
The purpose of this report is to explore the challenges
and potential solutions of current computer-based assessments. With
increasing numbers of applications for graduate jobs, there exists a
growing pressure among applicants to succeed at online assessments
set by employers. The vast number of applications received, compared
to available positions, puts an even greater need for employers to de-
velop effective and fair assessments. These can then identify the most
appropriate candidates who are able to best demonstrate their abilities
in numerical reasoning, which have been shown to be a reliable predic-
tor of job performance. In our report we approach the four questions:
How do people learn through computer-based assessment? Why is it
important to study mathematics? The Numeracy Vs. Mathematics
debate. Why do certain employers use numerical testing? Are certain
types of learners better at numerical reasoning tests? By creating our
own numerical reasoning test, we hoped to explore the factors that
affect participant’s performance. The team carried out extensive sta-
tistical analysis hoping to relate our findings back to our hypotheses.
We found significant findings for all four of our proposed hypotheses.
The overall findings of this report demonstrate that current numerical
reasoning assessments and practice tests are potentially flawed. Our
findings suggest that they fail to accommodate to all types of learners,
and in most cases fail in providing comprehensive feedback. From our
research and test findings we encourage companies and educational in-
stitutions to take on board our recommendations, such as to improve
both the feedback and preparation they offer to candidates.
Contents
1 Introduction 3
1.1 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Preliminary Findings . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 The ’Mathematics Vs Numeracy’ debate. . . . . . . . . 5
1.2.2 Why is mathematics important? . . . . . . . . . . . . . 7
1.2.3 Do people forget mathematics skills as they get older? 8
1.2.4 Why do certain employers use numerical reasoning as-
sessments? What skills do they think it will show? . . 9
1.2.5 How do people learn through computer-based assess-
ment? What works and what does not? . . . . . . . . . 14
1.2.6 Will different types of learners (kinaesthetic, visual
etc.) have different levels of numeracy? . . . . . . . . . 15
2 Methodology 16
2.1 Group Organisation . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Meeting Times . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Communication . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3 Combatting Risk . . . . . . . . . . . . . . . . . . . . . 18
2.1.4 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Preliminary data collection . . . . . . . . . . . . . . . . 20
2.2.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Test design, creation and analysis . . . . . . . . . . . . . . . . 23
2.3.1 Producing the Questions . . . . . . . . . . . . . . . . . 23
2.3.2 Programming the Test . . . . . . . . . . . . . . . . . . 25
2.3.3 Test distribution . . . . . . . . . . . . . . . . . . . . . 31
2.3.4 Test Analysis . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Report Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1 Skill development - Graduate Skills . . . . . . . . . . . 36
3 Findings 38
3.1 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1
3.2.1 The Maths Vs. Numeracy Debate. Why is mathemat-
ics important? . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Why do employers use numerical reasoning testing? . . 50
3.2.3 Do different learners perform better on numerical rea-
soning tests? . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.4 How do people learn through computer-based assess-
ments? . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.5 Regression Modelling . . . . . . . . . . . . . . . . . . . 56
3.3 Feedback Findings . . . . . . . . . . . . . . . . . . . . . . . . 61
4 Conclusion 63
5 Evaluation 66
5.1 SWOT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.3 Opportunities . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.4 Threats . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Bibliography 77
7 Appendix 81
2
1 Introduction
In recent history our world has bared witness to some of the most
revolutionary and exciting technological advances of all time. We live in an
age where computers seem to hold a role in society on par with that of the
basic necessities such as food or water. Our planet no longer revolves merely
around the sun, but around all things computer related. These advancements
have caused a great evolution in human society; we have gone from a very
much physical world to a more paperless and virtual one. This is even true
for assessments. Nowadays companies require candidates to be assessed via
online tests as opposed to the traditional paper and pen exam. This report
endeavoured to explore such computer-based assessments. In particular, we
have looked at numeracy tests exploring both the challenges they face and
the potential solutions.
1.1 Aims and Objectives
1.1.1 Aims
Computer-based assessments are widely used by employers, govern-
ment departments and educational organisations, the list nowadays is endless.
The question is why? First and foremost, we have examined the aptitude of
a candidate in a particular subject. Two common subjects that have been
examined via online assessments are numeracy and literacy, which are two
key skills for employability, the applied skills of the subjects - Mathematics
and English respectively. There is a definite need for candidates to both learn
and improve from these tests. Motivated by this fundamental necessity we
have focused on creating our own form of computer-based assessment as part
of our project. We have used this as a vehicle with which to answer several
questions that target the way in which a person learns. This has been done
by analysing data from a sample of people who had taken our test. We have
carried out research around this subject area of computer-based assessments
and have used our data to compare our findings with the current literature
available and thus provided insight into online testing that can be useful to
both universities and employers.
3
1.1.2 Objectives
Our objectives were clear. Firstly, we have researched extensively
into literature relating to education and current learning methods. This lit-
erature has enabled us to understand the theory behind the ways in which
people learn and improve. Our research on this was both general and specific
to computer-based assessments. Furthermore, we have explored the types of
learners that exist. Once these were clearly classified we were able to explore
effective learning methods catered specifically to them. This acted as an im-
portant step since the aim had been to create an assessment which provided
effective and comprehensive feedback.
It had been planned to develop a computerised numerical reasoning
test with which we gathered the necessary data, in order to support or con-
tradict the literature and current research hypotheses that exist within the
community. We had planned to incorporate an adaptive feature into our as-
sessment which has been crucial in equipping our assessment with the ability
to tailor questions to an individual’s ability based on their performance in
the previous questions. This has not only aided our statistical analysis but
has also provided participants with the relevant practice and training that
they require. It has also been planned to provide real time feedback which
will allow participants to instantly identify where they went wrong and more
importantly how they were able to correct this.
The test was primarily a numerical reasoning test and so incorpo-
rated numerical skills similar to those tested in graduate job applications.
We therefore felt it would be beneficial to research the question “Why test
mathematics?” These findings shed light on exactly why employers incorpo-
rate such assessments into their application process and what they hope to
discover through doing so.
Age is a factor which uniquely defines a person. In our project, we
have explored whether there is evidence to support the idea that mathemat-
ics skills can deteriorate if you stop using them. Moreover, the idea that
mathematics skills are not permanent and require regular revision in order
to remain within one’s memory. This period of time could be in the years
post GCSE or A Level. We have been able to examine whether people who
study mathematics have a greater advantage compared to others. All of these
findings have been compared against the current literature available and thus
after analysis of the results of our test, we have been able to highlight any
supporting or contradicting trends.
4
There is a wide spread controversial debate, not only across the aca-
demic community, but also across the world as to whether mathematics and
numeracy are essentially the same thing. It poses the question of whether nu-
meracy skills essentially rely on mathematical skills and vice versa, whether
they are in fact completely separate disciplines. We felt this is a relevant
area to explore since it there has been talk that the UK Government has
plans to change the current mathematics GCSE by removing numeracy from
mathematics and treating them as independent subjects as mentioned above.
We hope to discover whether mathematics students actually have an advan-
tage in numeracy tests, given that this is a skill that is not relevant to, nor
practiced at degree level.
We have modelled the data that we had collected from our test re-
sults, with relevant graphs, and have further analysed it using appropriate
statistical techniques. Modelling our data in appropriate mediums has en-
abled us to efficiently compare our results to those found in the literature.
Finally, we have aimed to assess how useful our findings are relative
to our broader stakeholders. We have set out to measure to what extent,
if any, we had been able to contribute to the current problem of learning
using computer-based assessments. It has been an objective of our’s to both
highlight the problems with what is currently available and possibly improve
it, by providing solutions using our findings. We have planned to approach
professionals and experts in this field with our results in order to get reliable
feedback.
1.2 Preliminary Findings
1.2.1 The ’Mathematics Vs Numeracy’ debate.
An ever greater need for both mathematical and numerical skills
is constantly emerging. However, there is a big debate amongst society as
to whether mathematics and numeracy should be considered to be the same
thing, or whether numeracy should be a subject in its own right. There are
current plans for GCSE Mathematics in the UK to split the subject into two
separate, independent GCSE’s: Mathematics and Numeracy [60].
Mathematics is defined by Oxford Dictionary as “the abstract sci-
ence of number, quantity and space, either as abstract concepts (pure math-
ematics), or as applied to other disciplines such as physics and engineering
(applied mathematics)” [18]. Moreover, numeracy is defined by Oxford Dic-
5
tionary as “the ability to understand and work with numbers” [19]. It may
be concluded from these two definitions that numeracy is a subset of mathe-
matics. However, it can also be argued that numeracy is a subject in its own
right and should be separated from mathematics as it is more applicable in
society and workplace.
Interestingly, a paper on Numeracy and Mathematics from the Uni-
versity of Limerick, Ireland, contained no universally accepted definition of
numeracy [42]. This is backed up by research from the University of Arizona
which found that the difference between numeracy and elementary mathe-
matics is analogous to the difference between quantitative literacy and math-
ematical literacy. [29] More importantly, no universal definition for numeracy
was agreed upon, although there was much overlap between current working
definitions. The most important difference between the two forms of literacy
is that quantitative literacy puts more emphasis on context, whilst mathe-
matical literacy focuses on abstraction [29].
In a paper produced by the University of Stony Brook’s Applied
Mathematics and Statistics Department, it is stated that all mathematics in-
struction should be devoted to developing “deeper mastery” of core topics,
through computation, problem-solving and logical reasoning - which is effec-
tively what a numerical reasoning test examines. Simple proportion problems
can be integrated into fraction calculations early on. In addition, the devel-
opment of arithmetic skills in working with integers, fractions and decimals,
should be matched with increasingly challenging applied problems, many in
the context of measurement. Solving problems in different ways ought to
be an important aspect of mathematical reasoning with both arithmetic and
applied problems in order to ensure a sufficient level of numerical skills for
further progression in society [56].
The Guardian Newspaper produced an article exploring a world-
wide problem associated with the difference between mathematics in edu-
cation and mathematics in the real world. The article states that all over
the world we are mostly teaching the wrong type mathematics [28]. The
Guardian then went on to describe how computers are primarily used for
calculation purposes, despite the fact that we tend to train people for this
use. This is true almost universally [28]. We are able to relate this to the
context of the ’Mathematics Vs Numeracy’ debate, since generally, the math-
ematics taught in education is too pure and distant from the real world, while
on the whole the mathematics used in the everyday life is numeracy.
Many companies require potential employees to sit a numeracy test
6
before commencing employment, despite the fact they already hold nation-
ally recognised qualifications in mathematics. An article from an electronic
journal for leaders in education explores this. Findings show that although
the term “numeracy” is not widely used across the world, there does ex-
ist a strong consensus that all young people are required to become more
competent and confident in using the mathematics they have been taught.
Furthermore, numeracy is found to bridge the gap between school-learned
mathematics and its applications in everyday life [1]. These findings sup-
port companies in their efforts to use numerical reasoning testing as a way of
seeing whether candidates can efficiently use their formal qualifications in a
practical environment. A candidate may have achieved high results in their
school exams, but this does not necessarily mean that they will be able to use
the gained qualifications for practical problem solving which is recognised as
the main use of mathematics [28].
An insufficient level of numeracy skills has been found to lead to
unemployment, low wages and poor health furthermore highlighting the im-
portance of numeracy [43]. The need for mathematics exists in all aspects of
everyday life within the workplace and in other practical activities such as
schools, hospitals, news and understanding any statistics [14].
1.2.2 Why is mathematics important?
The study of mathematics can lead to a variety of professional ca-
reers such as research, engineering, finance, business and government services
[14]. This is supported by the University of Arizona, Department of Mathe-
matic’s who also added social sciences to the above fields [15]. It should be
noted, however, that these careers are fundamental for the world’s economy.
Therefore, it is important to ensure that people working within those fields
have sufficient skills to ensure correct and efficient problem solving and pre-
vent any detrimental consequences.
Finally, it has been suggested that poor numeracy leads to depres-
sion, low confidence and low self-esteem, leading to social, emotional and
behavioural difficulties increasing social exclusion, truancy and crime rates
[41]. With the digital age, 90% of new graduate jobs require a high level
of digital skills which are built on numeracy. Although computers are able
to reduce the need for human interaction in certain calculations, sufficient
numeracy skills are required to enable efficient use [41].
7
1.2.3 Do people forget mathematics skills as they get older?
Research has found that a severe loss of both numeracy and lit-
eracy skills often occur in adulthood, with 20% of adults experiencing dif-
ficulties with basic skills needed to function in the modern day society [6]
[20] [38]. Simple numerical calculations, such as percentages and powers, are
found difficult despite being taught and tested to the government’s standard
throughout education.
The effect of being unemployed has been explored for both men
and women and it has been found that numeracy skills get steadily worse
the longer a person is without a job [6]. Interestingly, women experience a
lesser effect than men due to their role in society being more diverse, hence
requiring them to use their numeracy and literacy skills more frequently. It
has also been found that the loss of skill largely depends on the starting level
of knowledge and understanding, and that those who have poor skills to
begin with experience a more severe deterioration. Furthermore, numeracy
skills have a smaller presence in everyday life as more people find themselves
reading than they do performing calculations. However, a decrease in liter-
acy skills leads to an even further loss of numeracy skills as it increases the
difficulty of understanding the posed question.
Important findings have been made amongst a group of nursing stu-
dents, who were asked to sit a numeracy test containing questions similar to
those that they will have to answer as part of their future job [20]. The aver-
age score was 56% with the most common types of errors being arithmetic.
There is expected to be a significant difference in results between students
with those who took a year out before commencing higher education. Those
who started immediately score on average 70%, while those who didn’t aver-
age only 47%. This shows that being in an environment that doesn’t require
the use of numeracy skills, has a deteriorating effect, not only on the ability
to perform simple calculations but also that of being able to extract relevant
information to set up an equation. This means that even with the use of a
calculator, these students are still likely to make mistakes. Students have
also been found unable to identify errors in their work, even when the result
found is unreasonable and unrealistic. Such results are potentially danger-
ous, for example, nursing students must perform calculations such as drug
dosage, which if incorrect will cost both the public in their suffering, and the
employer in having to provide additional training.
Due to the ever increasing importance of skills in the world of work,
8
especially early on in a career, a lack of numerical competence has an un-
desirable effect on employment of these individuals, which in turn affects
their standard of living [6] [46] [38]. Such requirements are brought about by
the recent changes to the labour market, with less semi- or unskilled manual
jobs available due to technological developments [46]. Unskilled workers have
difficulty in both gaining and retaining employment, and so are the first to
suffer in the case of downsizing or a crisis [6]. A low level skill set also limits
individuals to lower and middle range jobs (bottom 10% to 20%), preventing
them from experiencing career growth, and leading to severe social exclusion
[46] [6] [7]. This causes a downward cycle as low skill level is passed on from
parents to children, therefore, accelerating unemployment through the gen-
erations [7].
The government has recognised this problem and created a ’Skills
for Life’ programme, which aims to provide basic skills to adults in order to
help them gain employment [38]. Other solutions include on-the-job train-
ing, or, as research suggests, we can even prevent such severe skill loss by
ensuring pupils reach a certain skill level whilst still in education [6].
There are, of course, other factors which lead to a low level of nu-
meracy skills, such as family background, learning environment and quality
of education [6] [3]. However, in this report we concentrate on how the low
level of demand for numeracy in everyday life affects a student’s performance
in an online test.
1.2.4 Why do certain employers use numerical reasoning assess-
ments? What skills do they think it will show?
In addition to this, with a constantly changing and advancing busi-
ness world, the way in which people are hired may be a natural result of
shifts in the business environment and modern workforces. A number of
studies mentioned in A. Jenkin’s, 2001 paper speculated that the increase in
numerical tests is due to the greater professionalism of the human resource
sector of many businesses, as well as the inclusion of standard selection pro-
cedures in their business [33]. In the 21st century, Human Resources (HR)
has evolved massively, and is now an integral part of most organisations [58].
All these factors may have led to the rise of assessment centres, due to a con-
tinuous desire amongst companies to gain a professional edge. They do this
by searching for alternatives to traditional methods of employment, much of
which is done through HR. This greater reliance on HR as a business sector
9
has led to the employment of much stronger recruitment methods, which (for
reasons that will follow) enable them to meet legislation requirements, and
promote a fair practice.
In many work forces, it has been clear in recent years that employ-
ability tests have been used for means other than just performance testing.
It has enabled a platform that assesses based on merit rather than personal
criteria reducing the impact of discriminatory practices [4]. Due to equal
opportunity legislation in many countries, which is most commonly related
to the differing proportions of ethnic groups hired, many employers could be
vulnerable to prosecution [58]. These types of random psychometric tests
can therefore be used as a way to reduce bias and discrimination [33]. One
factor in explaining the increase in the use of these tests may therefore be as
a prudential response to changes in hiring attitudes and legislature. On the
other hand, the opposite has also been said - that companies need to keep
legal compliance in mind when they use psychometric tests [12] so as not to
offend candidates by using irrelevant tests. In addition, when using these
tests, the role of bias has been explored as many psychologists and compa-
nies note that testing is an intrinsically culturally biased procedure that can
cause discrimination against ethnic minorities. This is as a result of cultural
differences leading to consistently different answers across several different
social groups. Although it can be noted that this applies more specifically to
judgement and situational tests, and not to numerical and verbal listening
tests that we are planning to test our research on [30].
The rise of these tests could also be attributed to the workplace’s
lower regards for formal qualifications as a method of streaming candidates
and predicting their future abilities [33]. This may be because young labour
force entrants across the EU have much higher attainments than they previ-
ously did, and hence it is harder to sort applicants out at the top end of the
spectrum based on attainment than in the past. This may lead employers to
screen applicants much more carefully [33]. Potentially this was caused by
the previous decade of education being hailed as ’too easy’ [27], which caused
achievements to be very high. Periods like this can have knock on effects on
recruitment methods, as a reaction to these ’more qualified’ applicants fil-
tering through the recruitment system and into the business environment.
However, this may be subject to change given that recent education reforms
claiming to ’toughen’ up the curriculum, have yet to see their full effect -
particularly in terms of employment. Examples of a lack of belief in the edu-
cation system can be seen by the movements of top employers. An example of
10
this would be one of the ’big four’ professional services firms, Ernst & Young
[26], who have recently changed their application rules so that educational
attainments, such as degree class, are no longer taken into account. Instead
they believe that their own in-house testing and assessment centres are a
reliable enough indicator of whether candidates will succeed [26]. Another
example of this was with the introduction of the Army’s own mathematics
test for applicants. The reason for its development was the increasingly chal-
lenging task of using GCSE mathematics results as a discriminator amongst
applicants for technician roles [33]. If formal qualifications continue to be an
insufficient indicator of applicant’s abilities, then companies will have to find
new methods to screen them, as is happening already with the increase in
psychometric testing.
When beginning our research, we went down many different routes
to get a broad range of information. Through emails and other means of
correspondence, we identified a few problems that employers encounter with
these psychometric tests. Firstly, they are not always sat in test centres, and
many are done online. This always leaves the possibility that people may
try to cheat on these tests and get other people to sit them on their behalf
[4]. This is unfair on other candidates, as well as misrepresentative, causing
people who may not be suited for a role to progress further in applications
than they otherwise would. Having said this, most of these tests have been
designed in such a way that they are fairly difficult to cheat on - for instance
having time restraints [53]. We have also found that these tests are mostly
used as a means of filtering candidates, so passing them doesn’t necessar-
ily guarantee any further success. Secondly, some companies have said that
tests may potentially be unrepresentative since people only get one chance to
take them[53]. Due to many different circumstances, an employee may well
underperform on the test, and so not demonstrate their full potential. This
could cause companies to miss out on hiring perfectly well-suited candidates,
in which case the tests would be causing a misallocation of their resources.
Some companies have a validation test in place that allows people who got
unexpected results to retake the test. However, obviously not all companies
will guard against inconsistencies in this way [53]. On the contrary, many
recruiters we spoke to stated that these tests and their scores are used only
to assist in the recruitment process, and are not the sole factor for employ-
ing people [51]. Instead they are used as a guidance to help make informed
decisions on applicants, so a well rounded application is essential in addition
to these tests [4].
11
“Numerical reasoning is the ability to understand, interpret and
logically evaluate numerical information. Numerical reasoning is a major
facet of general cognitive ability, the strongest overall predictor of job per-
formance” [44].
Due to the numerical reasoning skills you exhibit when you take a
numerical reasoning assessment, they are seen to be the ’best overall predic-
tor of job performance’ [44]. Both numerical and verbal reasoning tests are
combined to be an overall aptitude assessment that highlights the most well
rounded, suited people for the job. Aptitude tests show employers skills that
cannot be replicated in interviews, nor be observed by reading CVs and look-
ing at past references. They are a true, accurate and quick assessment of how
candidates perform on the spot in a pressured environment. The ’govern-
ment mathematical report’ [25] alongside Careers’ websites, such as Assess-
ment Day Ltd [37] and Inside Careers [31], agree that the only mathemati-
cal abilities being tested on numerical assessments are addition/subtraction,
multiplication, percentages, currency conversions, fractions and ratios. In
addition, they are testing the ability to “interpret the tables and graphs
correctly in order to find the right numbers to work with” [31]. Numerical
reasoning tests are normally timed, in order to measure applicants’ ability to
think on their feet and problem solve under time pressure.
Prospects [45], a website designed to help people looking for jobs,
stated that employers in most industries are looking for applicants with plan-
ning and research skills, i.e. those applicants with the ability to find relevant
information from a variety of different sources. Information can be presented
in a variety of ways, such as with numbers, statistics or text in tables, graphs
and reports. Employees need to be able to understand, analyse and interpret
research and appropriately use it. Numerical assessments are testing these
exact skills.
In addition, tests can have varied levels of difficulty, to represent
the levels of numerical skill that will be needed for the specific job. SHL
Talent Measurement Assessments create a wide range of tests, ranging from
aptitude and personality tests, to customised tests for individual companies
[8]. They create a variety of tests appropriate for different job levels and
industries. Numerical reasoning tests can be adapted to have more complex
questions, requiring a more advanced level of numerical knowledge and skill.
Another way of making them more challenging is to shorten the time avail-
able to complete the test. SHL have quoted that their tests represent the
’level of work and academic experience’ [8] required for a specific job role.
12
For example, SHL released an ’Aptitude: Identify the Best Talent Faster and
at Less Cost’ brochure [9] stating that a Semi-Skilled staff job will require
a VERIFY Calculation Test, where as a Director or Senior managerial job
will need to be tested using the VERIFY Numerical Test, which is far more
advanced.
Furthermore, as numerical reasoning is just one aspect of an apti-
tude assessment it means applicants applying for highly numerical jobs may
also get asked to take a verbal reasoning test. In all jobs, an ability to com-
municate with colleagues is essential. This reiterates the fact that aptitude
tests are used to find the overall highest-calibre applicant.
A job application process is not a simple task. For many job appli-
cations candidates must spend hours researching the company, before writing
the application form and preparing for interview. Practising the skills ex-
amined in numerical tests is just another aspect of a job application that
requires preparation. Does an applicant’s mark improve with practice? If so,
then applicants can practise in order to achieve high results, no matter what
degree they study or how long it’s been since they last studied mathemat-
ics. For example, even an applicant that stopped studying mathematics at
GCSE level can use the numerous online resources available to practice and
prepare for numerical tests, and hence could easily ’revise’ for such a test
and potentially perform very well.
The overall consensus from our sources, is that numerical tests used
by large companies (especially those with large numbers applicants), are gen-
erally a candidate streaming process. With UK education standards rising
and a larger number of students receiving higher education (In January 2015,
592,000 people had applied to University, up 2% from the year before) [11],
more people are eligible to apply for graduate scheme jobs. High Fliers Re-
search presented their findings in a report, ’The Graduate Market in 2014’
the Telegraph [49] and stated that graduate schemes now receive approxi-
mately 39 applications to every available job. With the number of students
applying to such schemes high and rising, it is extremely hard to differenti-
ate between candidates who have all achieved high grades and well regarded
university degrees. How do you select the ’best’ candidate from thousands
of similar applications? Due to this difficulty, companies use these to reduce
the number of applicants they consider in the next application step. Accord-
ing to Personnel Today [47], 80% of companies use standard off-the-counter
numerical tests provided by companies such as SHL. Only 18% use a test
which they have tailored to measure unique, customised skills that they are
13
looking for. Some would argue that since off the counter tests aren’t unique
to a company, then such a numerical test will not truly assess competency
for a specific job role.
1.2.5 How do people learn through computer-based assessment?
What works and what does not?
Another topic we explored was how people learn through computer-
based assessment. There are many methods that aid learning on a com-
puter. The most popular and commonly used forms of these are multiple
choice or true/false questions, labelling images, rank ordering and gap fill-
ing. Computer-based assessment can be very popular with both students and
teachers. They increase student confidence and are liked by students due to
the fact that they get rapid, if not immediate results. They can even be com-
pleted in a student’s own time when they are ready to do so. A teacher is
also likely to use these methods as a way of administering frequent formative
or summative assessments, since less time is spent marking. Then not only
can they spend more time adapting their teaching methods (depending on
the results of these assessments), but they can do so reasonably soon after
the test is taken [39].
Feedback is crucial to the learning process and, as mentioned, one
of the advantages of immediate feedback is that the student receives their
result straight away, rather than after they’ve moved on from a particular
topic. A study conducted at the University of Plymouth [36] compared two
groups of students; one using several online materials with two levels of feed-
back and another using none of them, to see how they performed in an end
of module summative assessment. The group using the available study ma-
terials performed significantly better than the other group.
Although computer-based assessments can greatly benefit a stu-
dent’s learning, there are concerns that online tasks, especially multiple-
choice questions don’t encourage deep thinking about a topic, and so don’t
aid learning [34]. In order to be as beneficial as possible, these assessments
need to both engage and motivate students.
14
1.2.6 Will different types of learners (kinaesthetic, visual etc.)
have different levels of numeracy?
Our final area of research was different learner types, and whether
some of them would be better at numeracy than others. According to ESL
kid stuff, there are many different types of learners, such as Tactile, Global
and Analytic. However most people fall into at least one of the following
three categories: Kinaesthetic, Visual and Auditory [52]. Katie Lepi [35]
describes these types of learners in her article, “The 7 Styles of Learning:
Which Works For You?”. She describes kinaesthetic (or physical) learners
as people who prefer using their bodies, hands and sense of touch. Writing
and drawing diagrams are physical activities, so this sort of activity really
helps them learn. Role-play is another commonly used activity for these
types of learners. They often have a ’hands-on’ approach, so learn best from
partaking in physical activities. On the other hand, visual learners do better
by looking at graphs, watching someone do a demonstration or simply by
reading. Finally, auditory learners are the kind of people who would rather
listen to something being explained to them than read about it themselves.
A common way for them to study is to recite information aloud, or to listen
to recordings. They also usually like to listen to music while they study [57].
There are many different types of learner styles, and even though
most people use a combination of all three techniques, they usually have an
idea of how they learn best. If you know what type of learner you are from
a young age, then it puts you at an advantage. However, it is also important
to adapt your learning techniques whilst you are young so that you are able
to use each learning technique effectively [48]. Our aim is to see if there is
a correlation between numerical ability (based on our test results) and type
of learner. We understand that online computer-based assessments mainly
cater for visual types of learner, and so we do not aim to change the online
test in order to reflect this, but instead hope to test this theory as part of
our analysis.
15
2 Methodology
2.1 Group Organisation
In this section, we discuss how we took full advantage of the time given to
complete this project, by organising the group members efficiently.
2.1.1 Meeting Times
In order to make the most of our meetings, it was important to
choose a suitable time for everyone. We decided it would best to meet 2-3
times a week, including a weekly meeting with our project advisor. We ini-
tially discovered that there were not many slots in the week that we could
all do, due to timetable clashes. To make things clearer we used the widely
acknowledged online scheduling tool Doodle (see Figure 1), to pick a conve-
nient time for all group members. The Doodle worked well as it was very
efficient and quick to carry out, and prevented the confusion we found with
suggesting times among ourselves. In the first few weeks of the project, we
met a considerable amount, however as term progressed we had set times
in which to meet every week; 15:30-16:30 on a Monday and 10:00-12:00 on
a Wednesday. To make sure we had a private space for every meeting, we
assigned one person to be responsible for booking rooms. During these meet-
ings we would discuss development of the project, by updating each other on
the progress of our individual responsibilities, and we would delegate future
tasks.
16
Figure 1: An example of us using Doodle to decide on suitable times for our
group meetings.
2.1.2 Communication
One in seven people now use Facebook to connect to their family
and friends [32]. It is the most popular form of social media. As a result,
we decided the best form of communication between group members, would
be through Facebook. We created a closed group (see Figure 2) so that we
could share files containing any work we had completed. We also exchanged
numbers and created a ’Group Chat’ on Whatsapp, an instant messaging
application. The team looked into using Google Documents to keep and edit
our work. We found we were limited by this as the site required a Google ac-
count which not all group members had. It also was more difficult to facilitate
comments and project related discussions. In contrast, our Facebook group
allowed all these things and it was quickly decided that this site would be our
main form of communication, as no other platform worked more efficiently.
17
Figure 2: Evidence that we created a closed Facebook group with all mem-
bers.
2.1.3 Combatting Risk
The decision to use Facebook as our main method of communication
was ideal for our project. It minimised the possibility of losing files and data,
which would have had a huge impact on our project. The use of a closed
group meant every member of the group could access and upload documents
quickly and efficiently throughout the project. So that the rest of the group
could edit key information or findings, if necessary. We also decided to split
into subgroups which combatted the risk of absence. If one member of a
subgroup was not able to complete a certain piece of research, for example
due to illness. The other members of the subgroup would be able to finish
it, since they would also have a good understanding of the task, having been
studying the same topic.
18
Initially we went about identifying all the tasks and activities we
wanted to complete throughout our group project. We were then able to
create a critical path (see Figure 3) to see if we would be able to finish
all these tasks within the time available. The critical path also allowed us
to recognise what needed to be prioritised and what could be completed in
parallel to one another.
Figure 3: Our Critical Path Analysis.
2.1.4 Subgroups
Once we had highlighted the key parts of our projects we decided
that we would split into subgroups to spread the workload. This enabled
us to undertake multiple tasks at once so that we can collaborate to meet
our timeframe. The four groups were: writing the questions, programming,
statistics and writing up of the report. When deciding whom to put in which
subgroup we asked each individual what their strengths and weaknesses were,
in order to best utilise our skills for instance, some members of the group
preferred programming to statistics.
19
Deciding who would be in each subgroup was not difficult. Some
members of the team were interested in the creative nature of writing the
questions. While others had enjoyed computer programming modules taken
in previous years. We decided to put more people into the programming
subgroup, having highlighted early on that this was probably going to be the
most time consuming part of the project, and that there was not a lot of
previous programming knowledge within in the group. Some members have
already statistically analysed models in the past, so they formed a statistics
subgroup. Then finally, another subgroup has put themselves forward for
editing and compiling the final report, as they have experienced working
with LaTeX and enjoyed editing the written information. Even though the
final version of the report will be passed through this subgroup, everyone has
taken a very active role in the write up of the report.
2.2 Data Collection
2.2.1 Preliminary data collection
The next stage for our team was to gather preliminary data to aid
our project - in particular with the development of our own online test. We
started by doing some initial research around our topic, in order to find areas
that we could look into further. After discussing our initial findings, we came
up with four main topics that we would research further, as stated in our
introduction. As a result, we had to forgo many other interesting areas, but
we decided that these were the four most relevant areas on which to focus
our objectives. We also felt that including any more areas of study would
cause us to not have enough time to complete the project, nor would we be
able to write about them in sufficient depth. We split up our team into four
two-person groups and assigned a different area of research to each one, so
as to manage our time and resources more efficiently. The only down side of
this was that not everyone in the group was fully informed on every topic.
However, this was easily overcome by compiling our research into one docu-
ment, and making it available on every social platform that we were using.
We went about our research in a variety of different ways. Firstly,
using available literature such as reading papers, articles, books and web-
sites to find evidence for or against our initial thoughts on each topic. This
involved much information dissemination and analytical skills on the part
of the researchers who had to read through huge amounts of information
20
and extract the necessary details in an articulate way. In addition, we car-
ried out primary data collection by emailing and contacting relevant sources,
such as employers, online test providers and academics. For some of these
we established individual contact, asking them specifically for advice or more
information on our project, but for the bulk of employers and career websites,
we generated a questionnaire to distribute to them. We decided to do it in
bulk after the quick realisation that not many companies were responding
to our emails. This could have been due to the fact that they were not in-
terested in our group project, or some companies might have been too large
to assign a contact or specific department to contact us. Using inputs from
the separate research groups, so as to make the questionnaire as relevant and
useful as possible, we asked a range of questions. This questionnaire was also
in a far easier format for companies to respond to, as it saved them time
and effort formulating unassisted responses. Bulk distribution ensured that
we got as many responses as we could in the limited time frame we had to
complete our research.
Once the research stage of the project had been completed and we
had all our necessary sources, we began to write it up. Within our subgroups,
we compiled our best findings and formalised them for our report. We each
wrote up our sections, complete with references, ready to be passed along to
the editing team. With this, we also included a full write up of our reference
information to go into our bibliography.
2.2.2 Survey
Now that the research stage of our project had been completed, it
was time to move forward with the creation of our own online resource to
test our findings. After discussing it as a group we decided that one of the
easiest and quickest ways to gather information was by creating an online
survey. We felt that this was far quicker to distribute and analyse results
with than other methods, such as focus groups, meaning we would have less
of a time constraint. The aims of the survey were firstly, to test some of
the conclusions and theories formed from our research and discuss what this
showed and secondly, to help us create our computer-based assessment by
finding out what students find most useful when they are learning. To do
this, we asked several questions about learning techniques, types of learners
and effective testing methods. We then passed this information on to the
subgroup in charge of writing the questions for our online test. They used the
21
survey feedback to help us create a test in response to what people preffered.
We felt this would give us a more tailored test written in the most helpful
way to students. The fact that the test was designed with student input
in mind meant that we could try to benefit test participants, and hopefully
improve on currently available tests.
Figure 4: The first page of our survey.
We created the survey using Google documents (see Figure 4) and
set it up as a form. We looked into other online survey distributors but
found Google documents to be the best platform as most required a payment
to release the survey if it contained more than 10 questions. Google forms
allowed us to make an unlimited amount of questions, was quick and easy
to use and exported our data straight into Excel for us to analyse. Using
contributions from all research groups we generated a draft survey. The sur-
22
vey was then checked by the group to ensure it was appropriate before it
was released. This allowed us to make a few necessary changes to the word-
ing and remove overlapping questions in order to shorten the test. We were
aware that people might be put off taking our survey if it was too lengthy
and therefore time consuming. For this reason, we tried to ensure that most
questions involved answering either with multiple choice or with a scale of
agreement. We also tried to make sure that the survey took no longer than
15 minutes to complete.
We then distributed the survey to the public so that we could anal-
yse our results as quickly as possible, given that we were on a tight schedule.
To take the survey all that was needed was the web link. We spread this link
across as many social media platforms as we could, including Facebook and
Whatsapp. We felt that this would be the quickest way to distribute our sur-
vey as it would target our main audience, students, in a way that was easily
accessible for them. The fact that the form was created online made analysis
far easier as we could see responses as they came in, and so by keeping an
eye on the data, we were able to start analysing the feedback as soon as we
had a sufficient number of responses. After approximately a week, we had
gathered a large amount of responses and when numbers began to plateau
we decided to start reviewing the data.
2.3 Test design, creation and analysis
2.3.1 Producing the Questions
While the programming group focused on the technical aspects of
creating a computer based assessment, those tasked with writing questions
for the test had to make sure they referred back to the information we had
already collected about online assessments. We started off by looking at re-
sults from our survey in order to determine what types of questions we ought
to be asking. As found in our initial research, multiple choice questions were
the preferred method of answering. The survey has shown us that gap filling
was the least popular method, however we decided that we would still in-
clude questions of this form in our test for two reasons. Firstly, it is the most
accurate way of seeing if a student has really understood a question, since
they can’t guess the answer, and secondly, we thought it would be interesting
to include so that we could see if students tended to do worse in these types
of questions, as we had hypothesised.
23
The next stage was to decide what topics to base our questions on.
We wanted to focus on the numerical reasoning style of questions just like
on the currently available employability tests. We did this by researching
these numerical reasoning tests and replicating their style of questions. This
ensured our test was relevant and had the potential of preparing people for
such testing.
Some initial points raised focused on the types of questions we would
have to ask, what topics we would focus on, and how many levels of difficulty
we should have. It was also noted that our questions would have to be both
realistic to program and relevant to research, in order for the results to pro-
vide us with useful information that the statistics group would then be able
to analyse. Each member of the subgroup was then tasked with a different
research assignment. One member focused on how to effectively test different
learner types, while the other two members focused on looking up example
questions at different levels of difficulty. Having done this, it emerged that
online tests naturally cater more for visual learners and not for the other two
learner types [10] [40] [50]. We took the decision not to focus on this aspect
when writing our questions, as we would not be able to created different
types of questions for a specific learning style, other than visual.
Having established that a variety of levels was essential to fulfil our
aim of creating an adaptive test, it remained to decide which difficulties we
would pick. Since we knew that all participants would have a minimum of
GCSE-level mathematics or an equivalent qualification, but not necessarily
any further qualifications, we decided to make this our top level of difficulty.
However, after their preliminary discussion with statistician - Dr Ben Young-
man, the statistics subgroup informed us that having any more than three
levels of difficulty in our test would significantly hinder statistical analysis
of data later on in the study. This is due to the fact we would be unable to
create an effective model. On the other hand, we were concerned that this
would reduce the range of results, as if we had six similar questions at the
same level, and it was likely that if a participant could answer one question
correctly, then they could complete them all. For this reason, we decided to
incorporate both KS2, KS3 and GCSE level mathematics. The final element
of the decision making process involved reading through the current curricu-
lum for KS2, KS3, as well as GCSE-level Mathematics.
The final element of the decision process involved reading through
the curriculum for Key Stages 2 and 3, as well as GCSE-level mathematics,
in order to single out the recurring, most important topics so that we could
24
base our test questions around them[22] [24] [23]. The final decision we made
was to write two questions for each of ’percentages’, ’ratios’ and ’algebra’ at
Key stages 2 and 3, and then to write six statistics GCSE questions, which
would incorporate these topics. In this way, we would have 3 multiple choice,
and 3 gap-fill questions at each level of difficulty.?
Once we?d made all the relevant choices, it was time to write the
questions. We found examples of questions on the topics we were focusing
on, by looking at teaching resources websites, such as TES [2] [54]. We
then adapted these to suit our own needs ? not only did we want to model
questions to resemble currently available online assessments, we also had to
generate wrong answers for every question that was to be multiple choice.
This was the hardest element of the process, as it involved deliberately mak-
ing common mistakes with the aim of generating possible wrong answers.
Luckily, this was achievable, and due to diligently writing down our thought
processes, we were able to relate how we?d created these wrong answers to
the programming team, so that they had an algorithm to use in the randomi-
sation of questions later in the process.
2.3.2 Programming the Test
In this section, we discuss the writing of our online test.
Our test acted as a vehicle to provide relevant data in order to help
answer the theories we had posed from our research. This meant it was an
integral part of our project outcome and was therefore very important to us.
We began meetings regarding the creation of the test very early on as we
were aware that it would be a very time consuming part of our project. In
these, we discussed how we were going to approach the programming aspect.
Firstly, we had to choose the programming languages that we would
use. We looked into a few different methods. Our first idea involved using the
Exeter Learning Environment so that all Exeter students would be able to
easily access the test. We thought this would help with distribution, as this
website is used by all students at the university, however, the programming
behind the website was far too restricting in terms of what we had planned
with regards to coding. It also presented the problem that our results would
be restricted to one university. Another language we could have used was
a version of Maple that would both code and present our questions. It
became apparent that it wouldn’t facilitate certain aspects of our test, such
as feedback and randomisation. After exploring these different options with
25
our project advisor, we decided it was best to use the popular server sided
language PHP, HTML to code the questions, and to store data on a MySQL
database. We choose PHP as it is a relatively simple language that integrates
easily with HTML, which is the main language used in the appearance of web
pages and was what our questions would need to be written in. It was also the
most flexible language so would not restrict us in the design of our test and
would enable us to create dynamic webpages involving randomised variables.
This was very important as many of our test aims involved randomisation
and forms, something that PHP would facilitate and so it would enable us
to move information on and off our database effectively. The only limitation
of this was that before we had access to an online server we would find it
difficult to practice running our code. This was overcome by using XAMPP,
a free software that replicates the process of using a server but can be done
offline. This meant that we could run our test as it was developed, in order
to check its appearance at every stage of developing the test.
26
Figure 5: Above is an example of PHP code which we used to generate
Question 1 in our Numerical Reasoning Assessment.
27
Figure 6: Above is an example of HTML code being echoed in PHP which
we used to submit Question 1 in our Numerical Reasoning Assessment.
The subgroups had researched the current curriculums, decided on
the layout and the contents of the test, so the next step was for the program-
ming team to create it. Firstly, we familiarised ourselves with both PHP
and HTML and got used to writing functions. We used a variety of resources
from the library [13] and the internet [59], as well as using our own previously
acquired skills. We aimed to understand how to print text, show images and
generate tables using HTML so that we could write a well-presented and pro-
fessional looking test. We also had to learn how to interact with our online
database, move data on and off it and store our results. Following this, we
split up the workload between five people, each person being in charge of cer-
tain questions and aspects of the test. The limitations we came across were
28
the time restraints for programming, because of the short 10 week period.
Due to our initially low level of programming skills, a significant amount of
time was spent on familiarising ourselves with the languages and understand-
ing the capabilities of the chosen languages.
The starting page of our test provided some preparation informa-
tion on materials the participant would require, as well as explaining the
procedure of the test. The voluntary nature of the test had been specified
to ensure the participants did not feel pressured and could terminate at any
point. The second page of the test was dedicated to data collection, gath-
ering information on age, gender, subject area, GCSE mathematics grade as
well as how long it had been since they had last studied mathematics. We
also included their university ID,as one of the variables which was then used
as an identifier. This was in case a participant chose to sit the test more than
once, so we would be able to determine whether an improvement in the mark
occurred. The scores awarded were also linked to this identifier so only rows
with a matching identifier would be changed, with a score of one for a correct
answer and a zero otherwise. We also chose to ask the participants what type
of learner they thought they were by providing relevant descriptions, in order
to aid us in determining whether that had an effect on their mark in later
data analysis. Another piece of data collected throughout the test was the
time it took the participants to complete each question, we did this using
timestamps in PHP. This helped to determine whether any cheating took
place, as well as determining the questions which were found most difficult.
As all of us already had sufficient understanding of the code for the
write up of the questions, as they were only tables and simple text so were
quick to do, allowing us to concentrate on the more complex parts of the
programming as described below.
Some of our questions (please refer to the Figures 39 to 81 in the
Appendix Section for print screen shots of our Numerical Reasoning Assess-
ment) included images, such as pie charts and stick diagrams to cater for
different types of learners, as mentioned earlier on in this report. Initially
we attempted to code them instead of just inserting the images themselves
so that we would be able to adapt them, but soon realised it would be an
unrealistic target for the short time and limited skills we had. As a group, we
made the decision to include them as static JPEG images instead, deciding
that the impact of this would be very small. In certain instances we could
avoid this limitation, as we were still able to randomise the questions. For
others, we decided it was more important to meet our time constraints and
29
generate our statistics than to worry about randomisation.
As we wanted to produce questions with both multiple choice an-
swers and manual input answers, two types of code had to be written. The
approach to writing the multiple choice questions was more complex and
time consuming, as realistic answers had to be developed in order to make
mistakes believable and the correct answer was not too obvious. However,
recording both types of answers as either right or wrong used the same pro-
cedure of defining a correct answer and comparing an answer given to it,
therefore assigning a value of zero or one.
One of the main aims of our project was to build a test that provided
immediate feedback, in order to help students improve as they went along
and provide understanding if they made any mistakes. Therefore, following
every question there was a separate page with a full step by step solution to
show how it should have been approached.
Another goal of ours was to randomise all of our questions. This
involved randomising any values that were used within the questions, so that
although the approach and the formula were the same, the question values
and answers would be different every time the page was opened. We chose to
do this to prevent people from cheating if sitting the test with other people.
Also, it enabled us to see more accurately if people’s performance improved,
in case they sat the test more than once. The process of randomisation made
creating false multiple choice answers and providing feedback more complex.
Multiple choice false answers were created using formulas covering the com-
mon mistakes, the values used within that had to be fetched and carried
through to the PHP page that submitted scores. The same page also pro-
vided the feedback, so values were carried to the formula explanation.
Another one of our initial aims was to make the test adaptive, so
that the next question depended on whether or not you had got the previous
one correct. The purpose of this was to enable people to reach an under-
standing of a topic before moving to a more difficult question. The team
began looking into various methods that would allow us to create banks of
questions with varying difficulties. However, when our statistics team con-
sulted with our statistical advisor, he advised us that this would be far too
hard to model, as we would have many different categories within our vari-
ables. Without the model we would then be unable to analyse our statistics
well and gain any consequential evidence from them that we could compare
to our research. Also with our insufficient programming skills, this would
have taken far too long to complete within the time frame. Being such an
30
unrealistic target we decided to exclude it, allowing us to concentrate more
on our other objectives.
Despite time constraints and possibly insufficient skills, a test capa-
ble of gathering the required data was developed within the timescale. The
next step was making our test live for participants to sit. We looked into some
different ways of doing this, but settled for uploading it using our university’s
servers. This meant that anyone with the web link would be able to access
and sit our test, giving us the most opportunity for people to sit it. One
other option we explored was to pay for an online server but this would have
been costlier, which was unnecessary when we had free resources. Another
option was to use our university college intranet servers, but this would have
limited respondents as our test would then only be accessible for CEMPS
(College of Engineering, Mathematics and Physical Sciences) students.
2.3.3 Test distribution
To ensure that we achieved statistically significant data analysis,
our statistics subgroup required a minimum of 40 responses to the test. We
were aware that we had a short amount of time available to distribute our
tests and that there were many potential difficulties associated with getting
enough participants. As a result, we made a very concerted team effort to
distribute the test widely, and as quickly as possible. We did this using a
variety of social platforms such as Facebook and Whatsapp in order to raise
awareness about the project, and to provide a web link for people to take
our test. A leaflet was also created to inform people about our test and the
benefits it could provide, which we distributed on campus to encourage a
wider spread of participants in terms of demographic such as degree type
and age (see Figure 7).
31
Figure 7: A leaflet promoting our Numerical Assessment.
2.3.4 Test Analysis
The first task for the statistics team was to identify what type
of analysis we wanted to carry out on our test data. This needed to be
completed at an early point in the project so we could relay this to the
programming team. The relevant questions were then programmed into the
test. We went about this task by breaking down each of the research sections,
reading all the research findings, and then deciding the relevant statistics we
needed to look into.
1. Why is mathematics important? The Mathematics vs Numeracy De-
32
bate.
(a) Look at the correlation between test score and GCSE mathematics
performance, degree and time since studying maths to see if any
of these affects the score.
2. Why do employers test for numeracy skills?
(a) What was the average score? What was the range of scores?
(b) What was the standard deviation of scores? This can identify
whether numerical reasoning tests are able to differentiate between
people.
(c) What is the standard deviation in the score achieved by people
studying the same degree?
(d) Did anybody resit the test? Did they achieve a better score the
second time?
(e) What was the range, standard deviation and mean time taken to
complete the test?
3. Do different learners perform differently on numerical reasoning tests?
(a) Look at the correlation between score and type of learner.
(b) Break down the questions categorically into charts, tables and text
questions. Which type of question got the best score?
(c) Look at the correlation between the type of learner and the score.
Do some types of learners perform better than others?
4. How do people learn through computer-based assessments?
(a) Did people read the feedback? What was the average time taken
between the questions, on the feedback page? Plot the frequency
of time.
(b) Did people perform better on the multiple choice questions or the
manual input questions?
(c) Did people speed up as they took the test?
33
’Practical Regression and Anova using R’, [21] stated that regression analysis
is beneficial because firstly, predictions of future observations can be made.
Secondly, the relationship and effect of multiple variables can be assessed and
finally, a general understanding of structure of all the data analysed can be
gathered. Therefore, for all the statistics we required in each research topic,
it was necessary to make a regression model for the test scores. The same
article also identified that the steps taken in regression analysis are:
1. Identifying the distribution of the data.
2. Identifying the initial regression model.
3. Carrying out an initial assessment of the goodness of fit of the model.
This would be through hypothesis tests on the variables and numerous
diagnostic plots.
4. Using methods to identify the best model fit.
’Applied Regression Analysis’ [5] proposed using stepwise regres-
sion to achieve the ’best’ regression fit. This is because working with more
variables than necessary is avoided whilst still improving the regression fit.
Stepwise regression starts with a regression model with one variable. It sub-
sequently adds and removes various variables until the largest coefficient of
determinant is achieved. Hence, the model with the largest significance is
identified. After this best regression is found, we will be able to identify
which variables have the most significant effect on test scores. This is vital
for answering our four research topics. We will also be able to make pre-
dictions on future scores, such as what score would a ’visual learning girl,
studying law, with a grade B in GSCE mathematics and who hasn’t studied
mathematics since GCSE’ achieve?
It was realised that we would need to collect as many responses
from our test as possible. We posed a question to ourselves, ’How many peo-
ple need to take our test in order for the results to be significant?’. Having
spoken to Dr. Ben Youngman, a University of Exeter Statistic Lecturer, we
agreed we cannot create a ’minimum number’ and that the distribution of
the scores will depend on the scores of those who take the test. It was clear
that as little as four scores would be insufficient to create strong arguments
from our findings so, as a group we made a personal aim to get at least 60
entries.
34
We decided to use R-statistic to run all of our statistic analysis.
R-statistic is a leading tool for statistics and data analysis. It very efficiently
performs the type analysis that we required, such as producing correlation
matrices and modelling data. R-statistic also easily integrates with other
packages such as Microsoft Excel. Hence, making it easy for us to export
our MySQL database, containing all the test data into an Microsoft Excel
spreadsheet and perform our analysis using R-statistic from that. Output
in R-statistic is presented in a very clear way that is easy to interpret. Our
final reason for using R-statistic was that everyone in the statistics team had
used it before, making us very familiar with the built in functions and pro-
gramming language. Additional reading in ’Practical Regression and ANOVA
using R’ [21] was also used to refresh and improve our R-statistic knowledge.
Figure 8: Above is an example of R-statistic code.
2.4 Report Feedback
As a group, we recognised the importance of getting external feed-
back on our report. Our project’s main aim was not just to create a test, but
also to see how our findings related to literature and to observe their poten-
tial impact on future students. Receiving opinions on our results would give
us a more comprehensive view of our work and would enable us to perform a
35
more thorough and independent evaluation. We decided to contact experts
via email as we thought this would be the most efficient form of communi-
cation.
Our first thought was to seek a statistician - we needed someone to
evaluate our model and give feedback on our findings. We met with the same
person who had advised us earlier on in our project, Dr Ben Youngman. We
hoped that he would be able to advise us on anything we may have missed.
We also sent our report to Rowanna Smith, the lead Careers’ Con-
sultant for the College of Engineering, Mathematics and Physical Sciences,
based in the Career Zone at the University of Exeter. We wanted to find
out whether, based on our findings, the university would consider using a
similar test as a resource made available to students in preparation for job
application tests. We also wanted to find whether our results were significant
enough for the University Career Zone to consider a change in the advice
they currently offer students with regards to preparing for these kinds of as-
sessments.
Our final point of call was SHL, a provider of numerical reasoning
tests. We wondered if they would consider changing their test writing meth-
ods based on our own assessment and its findings; for instance inclusion of
feedback. We also questioned whether they would consider taking into ac-
count different learner types by adapting their tests to suit a wider range of
people and their learning habits.
2.4.1 Skill development - Graduate Skills
The project we undertook led us to develop a variety of skills as
well as gain new ones. As the project involved a very tight time frame, a
large amount of time management and task delegation had to take place to
ensure all the different sections of the project came together effectively and
on time. To enable this to happen the project was broken down into separate
sections, which helped us stay on track. These enhanced skills will prove very
useful in later life, as many graduate roles will require efficient management
of many different tasks, most likely with tight deadlines. Not only did we
have to manage our time well by setting realistic targets, but we also had
to adapt to changes and challenges that occurred along the way. Over the
course of the project, this enabled group members to become more flexible,
something required in all future aspects of life.
Working in a team has been an essential part of this project, without
36
which our outcome would have been completely unattainable. The ability to
work in a team is an invaluable skill for later life and prepares us for situations
both in and out of the workplace. The ability to communicate effectively with
the other members was crucial in enabling the team to stay on track and be
transparent so that we could be aware of any potential problems. As a grad-
uate, this is vital in order to be able to be part of a working society. Another
skill acquired during this project was the ability to research quantitatively
and qualitatively as well as to disseminate information and synthesise oth-
ers’ ideas. This process was approached in different ways, including a vast
amount of reading and contacting both employers and academic members
of staff, resulting in a well-rounded background for the report. Research
skills are essential to many roles, either directly for graduates in technical
roles, or indirectly as transferable skills by improving general analytical and
summarising abilities. Designing the test to collect our data developed the
team’s problem solving skills as we had to explore several ways to achieve
our programming criteria. It also gave us all a basic understanding of one
of the most popular scripting languages on the web, an invaluable skill to
many employers. The team has also acquired skills in data collection and
statistical analysis in order to understand and present the project’s findings,
something that many employers look for and value highly.
A large aspect of our project involved presentation, both as small
progress reports and as a final summary of our report. Through this, all
group members had a chance to present their work to an audience, gaining
beneficial speaking and performance practice, something we get very little
chance to do due to the nature of our degree. This enables people to gain
the vital social skills that employers hold in high regard and make up a large
component of job applications.
37
3 Findings
3.1 Survey
When it came to collecting survey results, it was reasonably simple
to analyse our data. Due to having created the survey in Google Forms, we
could monitor responses as they came in. Google Forms also produced some
basic statistical representation for us, so immediately we had an overview of
the key information. Overall, we gained 79 responses which was much higher
than our aim of a minimum of 40 respondents. In terms of demographics,
we noticed that we had a higher number of female participants with over
70% being women. Also, almost 70% of our respondents were in their third
year of university and so dominated our responses (see Figure 9). This was
likely due to the fact that our own group was made up of a group of third
year students, who were predominantly female. However, due to the nature
of our survey and the questions asked we did not feel that this would cause
any issue. Especially considering the fact that third year students are the
most likely to have come into contact with employability tests, and should
also have a good idea of how they learn best at this stage in their education.
Figure 9: Pie Chart of Gender and Year of Study of participants in the
Survey.
38
Figure 10: Bar Charts of responses in two survey questions.
The first set of questions in the survey gave us information on the
different ways in which people like to learn and to be tested. The survey
worked in two ways. Firstly, it acted as preliminary data for our research,
through gathering more information and current opinions on online tests.
We planned to compare it to our test findings later on in the process, in
order to make comparisons. Secondly, the survey provided new data for
us to compare with what the group had already learned from the research
carried out. We found that the majority of people preferred multiple choice
questions on online assessments, concurring with our research findings that
this is a popular, commonly used method. It is worth noting that since
possible answers are always provided, it means that these questions do not
require as much original thought on the part of the student. It also means
that students already have a percentage probability of selecting the correct
39
answer, in our case on average 20%, something that may influence people’s
preference for this style, based on perceived comparative ease. The fact that
this style was preferred was passed along to the subgroup tasked with writing
the online assessment questions, so that this could be taken into account.
It was also seen that people feel they benefit significantly from feedback.
This matches the opinion we found when conducting research, based on a
Plymouth study [36]. This suggests that not only do people want feedback,
but that a student’s results can improve significantly as a result of it. This
confirmed our decision to include feedback as a major component of our own
online test, to ensure that people would be able to learn from their mistakes
in previous questions.
In terms of Mathematics vs. Numeracy, there was a mixture of
results. There were originally mixed opinions from people when asked if they
believed their mathematical skills had deteriorated since they had stopped
studying mathematics, with the majority of people taking a neutral stance
(see Figure 11). The second largest response was ’slightly agree with the
statement’, implying that slightly more people may feel this to be true. This
may be slightly skewed, as people still currently studying mathematics are
likely to strongly disagree that their abilities have deteriorated, due to the
fact they are still using them. This surpassed the purpose of the question
- to investigate people who have stopped studying maths, and consequently
do not use it as often. This may have been the reason for the large spike
in people strongly disagreeing with the statement, which made it harder to
analyse how people perceived their maths skills, as many of the results shown
were not relevant.
Figure 11: Bar Chart of responses on deterioration of mathematical skills.
40
Figure 12: Bar Chart of responses on deterioration of mathematical skills,
excluding mathematics students.
To combat this problem, we decided to exclude mathematicians
from our data and to repeat our statistics (see Figure 12). This ensured that
all respondents tested had all finished studying mathematics and so we could
give a full representation of deterioration of mathematics skills. From our new
calculations, we then produced a graph similar to our expectations, showing
that most people felt that their skills had somewhat deteriorated since they
had last used mathematics. This clearly agreed with our research, which
showed a strong difference between people who currently study mathematics
and those that had stopped. We could compare this with the same effect that
results from being unemployed showed us in our research. It was also similar
to the study on nurses [20], who performed worse on a similar test after
a gap year. However, our data consisted more of qualitative opinions than
quantitative results. This slight difference meant that we could not draw any
solid conclusions from comparing the two, but could, however, take note of
the strong similarities. One limitation of our data may have come from the
differing opinions on when participants classed themselves as having stopped
studying maths. Some students who study more scientific or quantitative
degrees may regard themselves as still using mathematics in their degree,
given that they use it regularly in their university work. While others will
claim not to study mathematics any more, since the subject itself is not
41
contained in their degree title. Despite this, we felt the discrepancy did not
impact our results too heavily, as such students would still have been likely to
be of the same opinion when it came to rating their mathematical ability, and
so we could still assess the difference. Another slight limitation in comparing
our data with literature was that in some similar studies, those tested had
been out of any form of study or work at the time, whereas all the students
in our survey are all still in academia. This would definitely have affected
the extent to which they felt their mathematics skills had deteriorated over
time, possibly making our results less pronounced than they otherwise would
have been.
In addition, our survey showed us that 67.5% of people (see Figure
13) believed numeracy and mathematics to be different things, which agreed
with much of our research regarding the Mathematics Vs. Numeracy debate.
This shows that the general consensus is that they are different disciplines
and require different skills, even if they technically overlap by definition. It
would have been beneficial to know why the students thought this, and if
they agreed with our research findings on potentially teaching them as two
separate subjects. However, due to the design of our survey we were limited
to a few set answers and so it is difficult to say how consequential these
results are. We attempted to overcome any potential gaps in a participant’s
knowledge by giving official definitions of both words, allowing them to make
a well informed decision, which may have helped to mitigate some of this
problem.
Figure 13: Pie Chart representing the opinion of participants on Mathematics
Vs. Numeracy.
42
Figure 14: Pie Chart representing how participants feel they learn best.
43
Since another large section of our research involved different kinds
of learners, we included questions on this in our survey. Our research showed
several of the learner types but we only choose to include the main three we
had focused on in the survey. The team found that the majority of partici-
pants fell into a set category, with less than 4% being unsure (see Figure 14).
The smallest proportion was of those who believed themselves to be audi-
tory learners; however, this was still over a fifth of respondents. The largest
section was of the visual learners with 41.8% of people placing themselves in
this category. We mitigated the risk of people not being aware of different
types of learning or what category they may fall into by getting people to
say which description fitted them best, instead of them picking from a list
of unfamiliar definitions. However, there was still scope for people to have
misunderstood and therefore picked a category despite not being sure, which
may limit the reliability of our data. Having said this, our research showed
that most people are a combination of these different learning techniques, so
some cross over was always expected. In terms of how the different learner
categories work, we believed that visual learners were likely to perform better
for our chosen type of online numerical reasoning test, leaving the others at a
disadvantage. When asked in our survey whether participants believed these
online tests cater for different learners, almost a third of them responded
negatively (see Figure 15). This helps to back up our research and hypoth-
esis by showing that many people do not feel that their learning abilities
are catered for. There is always the possibility of this proportion being over
estimated by people who do not perform well in these tests in general or feel
they should have performed better regardless of what type of learner they
are. Nevertheless, as we still have a strong majority this should not have had
a significant effect, and thus our data still shows that a significant amount
of people feel that they are not examined effectively in online tests. We were
able to test this further in the results from our own numerical reasoning test.
44
Figure 15: Pie Chart representing the opinion of participants on whether
Computer-based Assessments cater for different types of learners.
3.2 Test
Our numerical assessment consisted of 20 questions split into three
difficulty levels; KS2, KS3 and GCSE. The average mark achieved was 15.28.
From Figure 16, it can be seen that the majority of participants scored highly,
with over 50% achieving a score greater than or equal to 15. Figure 17
supports this, showing an interquartile range of 6 from a score of 13 to a score
of 19. The interquartile range shows a strong concentration of high scores.
There is a negative skewness in the results. The highest score achieved was
20, showing the ability to score full marks, whereas the lowest score achieved
was 5.
45
Figure 16: Histogram of Total Score.
Figure 17: Boxplot of Total Score.
46
Figure 18 supports the negative skewness of scores. There is an
overall bell shape, showing a normal distribution. The light deviation of the
peak to the right shows the negative skew.
Figure 18: Density Plot of our model.
To further analyse our data, we will break down the statistics into
the four research topics previously mentioned.
3.2.1 The Maths Vs. Numeracy Debate. Why is mathematics
important?
The initial hypothesis was that a participant’s score would deterio-
rate as the number of years since studying mathematics increased. Surpris-
ingly, Figure 19 shows no correlation between score and years since studying
mathematics as the line of best fit is a straight horizontal line about the
47
mean score. However, the correlation coefficient is −0.21, showing a small
negative correlation between the variables.
Figure 19: Scatter plot showing the total years since studying mathematics
vs the total score.
Furthermore, from our research into Numeracy vs Mathematics, it
is implied that numerical reasoning assessments do not test the skills which
participants learn at GCSE-level maths. Therefore, years since studying
mathematics has little effect on the score achieved. Our findings support
this argument. However, due to the fact that the average age of participants
in our numerical reasoning assessment was 20.24 and the average number of
years since studying mathematics was 1.68, this does not reflect the whole
population.
The correlation between GCSE mathematics grade and score is
shown by Figure 20. It shows that a higher grade achieved at GCSE resulted
in a higher score in our numerical reasoning test. The mean score achieved
by a participant with grade B at GCSE was lower than the mean score for
an A or A* candidate. The lowest score achieved by an A* grade participant
48
is higher than the lower quartile of A and B grade participants. The highest
score achieved by any B grade participant is lower than the average score
of an A* grade participant. From these findings, we can see that a strong
mathematical background can result in a significantly higher numerical rea-
soning test score. As the number of years since studying mathematics has
little correlation with the score achieved, this shows that mathematics GCSE
grade and actual mathematical ability affect a participant’s score more. This
is again supported by Figure 21, which shows participants studying a math-
ematical degree. It is assumed that these students have strong mathematical
abilities, and that this is the reason they achieved higher scores. We cat-
egorised’ ’mathematical degrees’ as Economics, Business, Medicine, Math-
ematics and Science. The lowest mean score was for participants studying
Humanity degrees. Interestingly, those studying a non-mathematical science
(such as Biology) scored higher on average than those studying a mathemat-
ical science. However, Figure 21 shows that these results are actually very
close. Therefore, we can interpret from this that all sciences require some
mathematical skills.
Figure 20: Boxplot of Test Scores and GCSE Mathematics Grade.
49
Figure 21: Boxplot of Test Scores and Degree.
3.2.2 Why do employers use numerical reasoning testing?
As stated above, the average score achieved was 15.28. The stan-
dard deviation of score results was 4.12. Standard deviation measures the
degree of spread of score results. Initial research into why employers use
numerical reasoning assessments showed that these tests filter out applicants
and help to differentiate between candidates with very similar applications.
As our lower quartile is 13, 75% of participants achieved a grade
higher than 13. If an employer had a filter that cut out candidates that
achieved a grade lower than 13, 25% of our participants would not have
passed the test. This shows that numerical reasoning tests can be a useful
tool to quickly remove weaker candidates from an application process.
The standard deviation of 4.12 indicates a large spread in scores.
This makes it a useful tool to differentiate between candidates, as score results
are varied and spread out over a wider range of values. Not all participants
will achieve similar scores. If everyone scored 15, they would all have to
complete further assessments to gage which was the best applicant. Having
varied scores reduces this problem.
50
Figure 21 shows that the majority of interquartile ranges of the dif-
ferent degree types are large. We see that applicants with similar degrees,
where one would expect similar mathematical ability, still have a varied range
of score results. This is useful for employers as they can use numerical rea-
soning assessments to differentiate between applicants with the same degree
title.
Initially, we wanted to look into whether people had repeated the
test to see if their score improved. This is because our research and survey
findings showed that feedback and practice of numerical tests should improve
your score.
The mean time to take the test was 19.47 minutes. This means on
average people took 57.81 seconds on each question. This justifies the reason
why employers enforce tight time limits on numerical reasoning assessments
(commonly a minute or less per question). This isn’t necessarily a method
to filter out participants, but as we can see from our timings, it means appli-
cants are put under pressure when completing the numerical reasoning test.
Employers are keen to find out if a potential employee can work under pres-
sure and in a set time frame. The level of difficulty of the numerical test can
also be adjusted by changing the time limit. If our numerical test had a time
limit of 15 minutes, less than 50% of participants would have been able to
finish the test. From initial research we found that numerical reasoning tests
are often used even in applications where numerical skills may not actually
be necessary. From our survey we found that 37.2% of people believed it
was unfair to be numerically assessed in their career job applications and
felt they were at a disadvantage to others because they were not ’good at
maths’ and ’had not studied it in a long time’. However, from our findings,
we can say that employers can increase the time limit on tests, for example
in our test to over 35 minutes, so that every participant is able to complete
the test in their own time and not miss questions because their time ran out.
This is concluded from the fact that box plot and whiskers in Figure 22, is
completely below 35 minutes, and only outlier times are above.
51
Figure 22: Boxplot of Time Taken to complete the Test.
3.2.3 Do different learners perform better on numerical reasoning
tests?
Figure 23 shows that visual learners on average achieved a higher
score than auditory or kinaesthetic learners. Visual learners taking our test
had the highest average and smallest range of scores. Literature research
done at the beginning of our project, along with our initial survey findings
suggests that numerical reasoning assessments used by employers online are
not catered to auditory or kinaesthetic learners, with 64.1% of people who
took our survey agreeing. The assessment being online, limits the ability
to make a numerical reasoning test practical and active to suit kinaesthetic
learners. Audio numerical reasoning tests are available, however are uncom-
mon and usually only used for participants in special circumstances (such as
visual impairments).
52
Figure 23: Boxplot of Test Scores and Learner Type.
Generally, people performed better in questions involving a visual
aspect, such as a chart or graph. The average pass rate on these questions
was 81.7%, whereas for text questions it was slightly lower, at 68.6%. This
may be because the image or table breaks down the information making it
easier for all learners to digest the figures, whereas paragraphs of text and
figures cater more towards visual learners.
3.2.4 How do people learn through computer-based assessments?
From our results, we can determine that the majority of participants
neglected to read the feedback provided. The average time taken on the
first four questions, were 6, 5, 9 and 5 seconds respectively. This is not
enough time to read, understand and learn from the feedback. Research
proposed that reading feedback improved score result, for example Rob Lowry
in ’Computer aided assessments - an effective tool’ [36]. Our initial survey,
Figure 24, also shows that 89.8% of people thought feedback would be a
useful tool in an online test. However, as our numerical reasoning assessment
was put forward as a ’test’ rather than a casual learning resource, people’s
priority could have been to finish the test rather than learn from it.
53
Figure 24: Bar chart of opinion on feedback from the survey.
If every multiple choice question was guessed, a participant would
have a 20% chance of getting each one correct, and hence we can statistically
approximate that they would receive 20% as their overall score. Therefore,
if a participant guessed all their multiple choice questions statistically they
would have achieved 2.4/12 on average on these questions. So the pass rate
when guessing multiple choice questions is 20% on average. Our results show
a pass rate of 81% on multiple choice questions. This is significantly higher
than 20%, suggesting that few (if any) candidates guessed all their results.
The average time taken and average pass rate for multiple choice
questions is 50 seconds and 81% respectively. For fill in the blank questions
the average time taken was 53 seconds and the average pass rate was 69.5%.
We can evaluate this to show that multiple choice questions are easier and
that a candidate has a stronger chance of a scoring higher. Simply put, if
their answer is not a multiple choice option, then they know it is wrong.
In addition, if their answer is similar to an option available in the multiple
choices, a participant can select this option and still have a change of getting
it correct. This is not possible to do in a ’fill in the blank’ type question.
This is supported by our survey, where 42.2% of people preferred multiple
choice questions out of 8 different methods.
Figure 25 shows the average time taken to complete each question
in our numerical reasoning assessment had no trend, as the line graph has no
pattern and looks random. If people learnt from the feedback provided, we
would expect time taken for each question to reduce as their understanding
of the questions asked increased. It became apparent that the feedback we
provided was not used, so we cannot support our initial thought. In addition,
the incorporation of three difficulty levels; KS2, KS3 and GCSE, could have
54
counterbalanced the decrease in time taken, as the questions should have
been getting more challenging.
Figure 25: Line graph of average time taken.
Furthermore, we looked at the average pass rate of the questions
in each level of difficulty category. We divided our test into 3 categories;
KS2, KS3 and GCSE. Figure 26 highlights that the average pass rate fell as
the level of difficulty increased from KS2 to KS3. The average pass rate for
KS2 level was 87.3%, whereas the pass rate for KS3 was 72.0%. The average
pass rate was consistent from KS3 to GCSE level, both being at 72.0%. Our
research supports the idea that employers can use numerical reasoning tests
of different difficulty level to cater for allowing varying numbers of applicants
through to the next stage of the application process. Participants taking a
KS2 level numerical reasoning test would achieve a higher grade than those
taking a GCSE or KS3 level numerical reasoning test.
55
Figure 26: Bar chart of question category and average pass rate on the
questions in that section.
3.2.5 Regression Modelling
The density plot in Figure 18 supports the hypothesis that score re-
sults follow a normal distribution (as previously stated this can be concluded
from the bell shaped figure). The first multiple linear regression model fitted,
involved the following variables: degree, years since studying mathematicss,
GCSE mathematics grade and type of learner. For research into our four
topic questions, we need to evaluate the effect all these variables have on the
overall score of the participant. The full summary of our regression model
used can be viewed in the appendix. As variables, degree, GCSE mathe-
matics grade and type of learner are categorical they are interpreted in R as
factors with levels.
The regression formula for this model is: Y = 19.084 − 0.501X1 −
3.266X2−2.254X3−0.291X4−0.006X5+3.004X6−3.817X7−2.486X8−
1.093X9 + 0.170W − 4.178Z1 − 3.060Z2 − 2.088K1 − 0.247K2.
Where Y is the test score. By using factors we limit the aux-
iliary variables X1, X2, X3, X4, X5, X6, X7, X8, X9, Z1, Z2, K1, K2 to bi-
nary (0,1). The X variables relate to degree, the W variable relates to years
since studying mathematics, the Z variables relate to GCSE mathematic
grade and the K variables relate to the type of learner.
The p-value for all the variables are: Degree= 0.061498, Years since
studying maths= 0.787063, GCSE mathematics grade= 0.007201 and Type
56
of learner= 0.224736. (These can be shown in the ANOVA table for the
model in the Appendix Section, in Figure 85). At a 10% significance level,
the coefficient for Degree and GCSE mathematics grade are significantly dif-
ferent from zero (0.061498and0.007201 are smaller than 0.1). However, Years
since studying maths and Type of learner are not significantly different from
zero (0.787063 and 0.224736 are greater than 0.1). Therefore, at a 10% sig-
nificance level, the variables Degree and GCSE mathematics grade do signifi-
cantly affect the test score result, whereas Years since studying mathematics
and type of learner do not. At a 5% significance level, GCSE mathematics
grade is the only variable affecting the test score. The other variables would
have coefficients not significantly different from zero.
The adjusted coefficient of determination is 0.2806, meaning that
around 28.06% of the total score can be determined by all the variables used.
We then used graphs created in R to help us assess the goodness of fit.
Residuals should be evenly distributed. The ’Density estimate of
Residuals’ in Figure 27 shows a bell shaped graph, with the peak being about
0. Therefore, the normal distribution of residuals assumption is met.
In the ’Residuals vs Fitted’ graph we can see the residuals of the
data points are randomly, yet evenly distributed around the 0 horizontal line.
This supports our assumption of a multiple linear regression as it shows a lin-
ear relationship between the variables and a constant variance. The residuals
of a multiple linear regression model are the difference between the observed
data of the dependent variable y, the test score, and the fitted values ˆy,
given by the model. This graph has few residuals greater than mod 6. These
stand-out residuals suggest that there are a few outliers in our data set.
In the ’Normal Quantile-Quantile’ graph, the residuals do stray
slightly from the 45-degree line. A perfect y = x line suggests perfect fit.
However, this is not drastic and the majority of the standardised residuals,
especially those around 0, sit closely to the line. This gives us further sup-
port in our model assumption of linearity and normal distribution. From
the ’Residuals vs Leverage’ graph in Figure 27, we can see that there are
two influential observations, one marked 20 and the other 44. Leverage is
a measure of how much each data point influences the regression model. A
standardised residual point with a high leverage can influence the response
of the regression model. Nevertheless, both of these points have a Cook’s
distance, a commonly used estimate of the influence of a data point of less
than 0.5, and hence are not influential enough to have an impact. They
would be considered too influential if their Cook’s distance was greater than
57
1.
All of this brings us to the conclusion that all the assumptions of
the regression appear to be upheld and that the model fitted is good and can
be relied upon.
Figure 27: Top left plot is the density estimate of the residuals. Top right plot
is the residuals verses the fitted values. Bottom left is the normal quantile-
quantile plot of the standardised residuals. Bottom right plot of the stan-
dardised residuals verses their leverage.
Using stepwise regression in R determined that a suitable reduced
model is one involving only the GCSE score.
The new model is Y = 17.3929 − 4.6151X1 − 5.6429X2, where
the only variable X is GCSE mathematics grade. We interpret this as fol-
lows: a participant with an A* grade at GCSE mathematics will achieve
a score of 17.39, a participant with an A grade will achieve a score of
58
12.7778(17.3929 − 4.6151 = 12.7778) and a participant with a B grade will
achieve 11.7500 in our numerical assessment.
The explanatory variables not selected are not necessarily unrelated
to the response variable, however, they simply do not add more information
than is already provided by the GSCE mathematics grade variable.
It could be said that there was multicollinearity in our initial model,
hence why this model with only one variable is viewed to be the ’best’. Mul-
ticollinearity is when variables are highly correlated and one variable can be
used to predict another. GCSE mathematics grade, degree, and years since
studying maths, could all be highly correlated. This is because someone tak-
ing a humanitarian degree will not have studied mathematics since secondary
school. By eliminating the other variables, we remedy multicollinearity.
The p-value for GCSE (as shown in the ANOVA table in the Ap-
pendix Section in Figure 86) is 4.708e−05, which is smaller than 0.01. There-
fore, at the 1% significant level, the coefficient for GCSE mathematics grade
is significantly different from zero and so affects the test score result dramat-
ically.
The adjusted coefficient of determination is 0.3177. This means
that about 31.77% of the test score is determined by the GCSE mathematics
grade.
As before, the graphs in Figure 28 help us analyse the goodness of
fit of the model.
The ’Density Estimate of Residuals’ graph appears to be a bell
shape. However, the normal distribution assumption does not appear to be
followed as well as our initial model. There is more of a negative skewness as
the peak is further to the right, not centred about zero. The linear model as-
sumption may also not be supported. The ’Residuals vs Fitted’ graph shows
residuals not evenly and randomly distributed around the zero line. This
disproves linearity and a constant variance. Additionally, the data points in
the ’Normal Quantile-Quantile’ plot do not sit closely to the 45-degree line.
The line follows the direction of the y = x line, however, the plot does not
follow the line as well as our initial model, with all the variables. Finally,
in the ’Residuals vs Leverage’ graph, there are again two potential influen-
tial observations. However, in contrast to our initial model, one of these
observations (point 44) has a Cook’s distance greater than 0.5. Therefore,
this data point could be considered to have a small amount of influence on
the regression model. As the Cook’s distance is still less than 1, it is not a
significant influence.
59
Based on these results, the model may not fit perfectly. The initial
model with all four variables appears to be a better fit than the regression
model created using stepwise regression.
Figure 28: Top left plot is the density estimate of the residuals. Top right plot
is the residuals verses the fitted values. Bottom left is the normal quantile-
quantile plot of the standardised residuals. Bottom right plot of the stan-
dardised residuals verses their leverage.
Our findings show that a model with the variables degree, years
since studying maths, GCSE mathematics grade and type of learner, all affect
the score achieved by the participant. Therefore, all these variables should
be noted and taken into consideration when employers use numerical rea-
soning assessments in job applications. However, the model with just GCSE
mathematics grade has a higher coefficient of determinant, and a significant
variable at the 1% level. Therefore, just looking at GCSE mathematics grade
60
on its own, would be good indicator of test score result.
Through our two models we can now predict the score a particular
participant will achieve. What will a participant score if they achieved an
A in GCSE mathematics, are studying Languages at degree level, have not
studied mathematics for 3 years and are an auditory learner? From our ini-
tial model they would achieve a score of 13.16. From the stepwise regression
model, they would achieve a score of 12.78. The models show a similar score,
both being 13 when rounded to integer value.
To conclude, even though we found that our test results were slightly
negatively skewed, and that the average age of participant was 20.24 - so
might not represent the entire population, we can see from our findings, that
a strong mathematical background can result in a significantly higher nu-
merical test score. As can be seen in Figure 20, those that achieved an A* in
Mathematics GCSE, have a higher mean average than those that achieved a
B grade. Our findings also support our hypothesis that visual learners would
achieve a higher score on average, than auditory or kinaesthetic learners.
Visual learners taking our test had the highest average and smallest range
of scores. Thus, it is clear to see why employers use numerical tests to filter
out weaker candidates in a job application process.
There may be limitations in our data because we haven’t considered
other variables that we have not considered that would affect numerical rea-
soning test score. For example: literacy skills, learning disabilities, A level
Mathematics grade (when appropriate) and number of practice numerical
reasoning assessments done.
3.3 Feedback Findings
Our final meeting with Dr Ben Youngman concluded that our ap-
proach to the model was ’logical’ and is appropriate to be distributed to other
external sources for feedback. This was helpful in regards to reassuring us
that our findings were significant and would be useful for employers to see.
The team went to see Rowanna Smith, the lead Careers Consultant
for the College of Engineering, Mathematics and Physical Sciences based in
the Career Zone at the university, to, as previously mentioned, receive feed-
back on our report. Her response was overall positive, commenting that our
regression model looks relatively correct. She thought that having a resource
similar to our numerical reasoning assessment made available for students
would help ’bridge the gap’ between the type of tests they are possibly used
61
to in their educational career and the type used by employers’. She also
said they she would be likely to provide workshops to aid non visual learners
in preparation for reasoning tests, since our findings showed that numerical
reasoning tests used by employer’s favour visual learners. When asked if she
thought our report could have an impact on the companies who make nu-
merical reasoning tests, she responded by saying that it would be incredibly
difficult to adapt these tests for all types of learners. This is because it is a
cheap way for employers to filter out a large amount of applications and that
it would be unlikely that this would change in the future.
Human Resources are the key department in companies in the job
application process. They are responsible for the selection process of ap-
plicants, from processing your initial online application, issuing out online
reasoning tests, all the way through to assessment centres and company inter-
views. We thought contacting the Human Resource departments in KPMG
and at the University of Exeter would give us an idea on whether, from the
information presented in our findings, whether it would:
1. Make them recognise different types of learners perform differently in
online reasoning tests, putting some people at a disadvantage.
2. To see whether they would consider having different options for different
types of learners.
3. To see whether they would provide practice tests with feedback
4. To see whether they feel it is fair to use numerical reasoning tests as
a filter out process, when your GCSE mathematics grade significantly
affects the test score.
However, unfortunately they did not respond. Hence, we cannot
conclude on these questions from a Human Resources point of view. This
could be investigated further in the future.
In addition, we made contact with SHL, a company whom create
reasoning assessments. We wanted to see if from our findings, they would
consider adapting their test to consider the four questions above. Again
regrettably, we received no response.
62
4 Conclusion
As illustrated throughout our findings, we drew many parallels with
our research. In our project, we set out to look into current online assess-
ments, with the aim of creating our own test improving on what is already
available. As mentioned previously, we decided to focus specifically on nu-
merical reasoning tests to give us an achievable goal. Objectives of this
project were to perform research on the following topics: how do people learn
in online assessments, why employers test for numeracy skills, the difference
between numeracy and mathematics and why they are both studied, how
mathematics deteriorates over time and finally, looking into different types
of learners and the affect on performance in numerical reasoning assessments.
As mentioned before, a survey was also carried out to get opinions from stu-
dents who undertake these tests. Based on this research an online test was
developed to try and resolve some of the issues found, such as lack of feed-
back and lack of tests that cater to different types of learners. Once the test
was developed and the data was analysed we can conclude that the aim of
the project has been met however, due to limitations throughout our project
such as short time frame and lack of experience with the used software there
is still scope for further research.
When looking into whether mathematical ability deteriorates with
age, we found that there was a strong link between the years studying math-
ematics and overall performance in our online test. We saw that generally
the longer it had been since participants had studied mathematics the worse
they would perform in our test. We can conclude from our statistical find-
ings that a students mathematical ability will deteriorate by a measurable
amount each year after studying mathematics. This is matched by the data
we received from our survey, where we found that 39% of people believed
their mathematics skills had deteriorated and the majority of remaining peo-
ple held a neutral stance, thus meaning that more people agreed with the
statement than opposed. This concurs with our original hypothesis formed
from the research in the preliminary findings section. As a result of this, our
hypothesis which predicted those who study mathematics or more science
related degrees would achieve a higher score in our numerical reasoning test
was confirmed. We saw that the mean score of those studying humanity and
language degrees was significantly lower than the mean score of science based
degrees (e.g. economics, biology, medicine).
We then looked at our results in the context of the ’Mathematics vs.
63
Numeracy’ debate. From our findings it is suggested that although our test
was numerically based, there is still a link between mathematics and numer-
acy since scientific based subjects performed better. In addition, we found
that 67.5% of people believed mathematics and numeracy to be different
disciplines in their own right, hence demonstrating there exists a perceived
difference. Clearly all of the above concurs with our research that suggested
that numeracy was the real world application of mathematics. Despite hav-
ing both differing definitions and perceived meanings, they inherently rely
on one another and thus agree with our previous hypothesis.
From our research we found that visual learners tended to achieve
higher scores than other types of learners, such as visual and auditory. Con-
clusions drawn from our original research support this fully. This was due
to the fact that visual learners perform better in visually aided questions
in graph or table form, making it easy for them to disseminate visual in-
formation. Therefore, numerical tests could put other types of learners at
a disadvantage. Further analysis found that 64.1% of people agreed that
numerical reasoning tests only cater for visual learners.
Through our findings we successfully managed to collect and statis-
tically analyse data for all of our chosen research hypotheses. From this we
observed significant trends and results in relation to our original problems,
formulated from the preliminary research carried out. We believe all our
findings are statistically significant and solidly support our research findings.
We also believe, due to the relevance and clear findings, our work
could provide several wider implications. This includes the way in which peo-
ple could prepare for psychometric tests. Our findings on people preferring
to learn through feedback could change the way companies, employers and
education centres approach the preparation for numerical reasoning assess-
ments. Organisations assisting the preparation of these tests could improve
their own practice tests through including this style of feedback, enabling
participants to learn from their mistakes. In addition, extra help and atten-
tion could be given to non visual types of learners, as visual learners have
been found to perform better on numerical reasoning assessments than oth-
ers. This could help overcome any potential disadvantages they may face
when entering employment.
Our findings highlight the necessity of numerical skills in the wider
world and in all professions. As major employers confirm it is an important
topic, there seems to be a definite scope for a universal definition of numeracy.
Findings have also stressed the need for the UK Government to consider the
64
addition or increased emphasis on the teaching of numerical skills at school, in
order to benefit future employability skills. GCSE mathematics was a factor
which affects people’s performance. The strong link between numeracy and
mathematics reinforce the need for a strong education in mathematics.
In addition, our project has provided scope for continued research
within this area. This could be through further and more extensive tests.
Our findings are also significant enough for us to believe that there are many
different factors that determine one’s ability in a numerical reasoning test.
65
5 Evaluation
5.1 SWOT Analysis
The following section concludes our project and provides an over-
all evaluation. In order to ensure that we form a comprehensive, reliable
and valid conclusion and hence evaluation, we shall utilise a well renowned
method of evaluation analysis known as SWOT. This acronym stands for
Strengths, Weaknesses, Opportunities and Threats [55]. Using this type of
evaluation structure we aim to provide a well rounded, detailed but most
importantly honest evaluation. In our experience the vast majority of eval-
uations only focus on the strengths and weaknesses. The concept behind
SWOT analysis allows us to reflect more on what threats exist within this
project and provide an important section on what future opportunities our
project could lead to.
5.1.1 Strengths
1. Teamwork
As a team we have worked extremely well together. There has
very rarely been any disagreement with the direction that our project
should take. When there was a disagreement each member was able to
express and explain their views and opinions openly, without restriction
or fear of doing so. Each team member has played a role in helping
to inform and guide other team members whenever they have been
unsure of what they needed to do. Overall our team has been extremely
committed in endeavouring to not only complete this project on time
but importantly to go above and beyond, completing it to a standard
that we are proud to call our own. As a team we held extremely good
communication and enjoyed spending time with each other.
2. External Professional Opinions
One thing that all team members were very in favour of was
seeking the opinions of professional mathematicians such as our Project
Supervisor, Dr Barrie Cooper and an academic from the Universities
Statistics Department, Dr Ben Youngman. We believed these external
professional opinions to be critically useful in guiding our project to
achieve our aims. The team was thankful that Dr Barrie Cooper took
the time to be present at weekly meetings with us, so we were able
66
to keep him constantly updated in terms of our progress, ideas and
findings. He constantly provided us with useful feedback allowing us
to develop and produce a higher quality report.
3. Individual Group Member Skill Sets
One of the aspects that we feel our group has benefitted from
is the combination of various skill sets, which each member brings to
the team. From our experience of working together we have been able
to see firsthand what skills our team members possess so we could col-
laborate effectively, utilising our skills in the most efficient way. Some
of our members held very strong programming skills whereas other
members possessed strong statistical skills. We have all enjoyed gain-
ing experience from each others varying skills and as a result, giving us
an opportunity to improve and develop new skills. This is particularly
important as all our academic studies have become a lot more spe-
cific in the final year possibly causing us to neglect broader employable
skills. Each team members’ background, both academic and personal,
has helped to form our team to become a diverse and driven one.
4. Team Members’ Passion
An important strength of our project is due to the fact that
all members were all in some way passionate about the area in which
our project has focused on, enabling them to be far more motivated as
a result of personal interest. Furthermore, being aware of how essential
numerical assessments and one’s ability to solve them are to graduate
careers, we were motivated to undertake this project. A few of us have
a view to go into the teaching profession whereas others are looking
to go into the financial sector or in fact further study into computer
science. Due to the variety of skills required every member was able to
play on their own personal strengths and interests, allowing them to
get the most out of the group project.
5.1.2 Weaknesses
On completion as a group we reviewed our whole project and al-
though we were very happy with the outcomes there is clear room for im-
provement.
1. Timing
67
Although overall we found that the team had worked effi-
ciently to get each task completed by the relevant deadline and strictly
following our critical path, the timing was still a weakness in this
project. This was particularly related to the programming side of our
investigation. Since we first set out to undertake this investigation, we
had highlighted the programming to be the most crucial and poten-
tially critical aspect of our project. Due to various problems occurring
in our program, which are explained further on, and having to seek
the guidance of our project supervisor Dr Barrie Cooper in order to
debug and resolve errors, time was lost at the end of the project. This
put pressure on having to write up our conclusions and findings in a
much tighter timeframe. We are aware that having the success of our
project rest on one volatile task was a weakness. However, since the
programming was paramount to the data collection, which our project
was based around, we are content in arguing that this was unavoidable.
2. Programming Restrictions
Unfortunately, due to our level of programming skills and the
time frame provided some coding of the questions was problematic.
This meant we had to put certain restrictions on some of the questions,
such as those including static images not being randomised. However,
we felt that this did little to limit our project as only a small proportion
of our questions were affected.
3. Programme Efficiency
Our online assessment played a critical role in the collection
of data ready for it to be analysed. In order to create this online assess-
ment we first had to write, develop and importantly debug the program
code until it was at a working level, which we felt would efficiently fulfil
the purpose it was created for. Although our code did just this, on re-
flection our team felt that given a larger period of time we would have
benefited from further learning of more advanced programming skills.
Thus would be able to improve the quality and as a result the poten-
tial of our code to bea vehicle to explore our research hypotheses. As
students and not professional programmers, although we did achieve
working code much to the team’s satisfaction, it would be unlikely to
be the most efficient code. For example, our code comprised of a series
of 48 independent scripts that used POST variables in PHP to com-
68
municate, link and run through each script in turn. More advanced
and professional levelled scripts could never run on a series of indepen-
dent scripts, each dependent on the one before and after it. There were
advantages to this, since for a small programme like our’s, it allowed
programming to become more manageable and achievable in our time
frame but the disadvantages lay in our programmes limited potential.
A numerical assessment in a professional company could comprise of
thousands of scripts. If they wanted to edit just one of these then the
entire programme would fail until they fixed the script in question. If
we wanted to improve our program we could have employed new and
more advanced programming techniques by engaging with the further
material, but this would involve more time and a complete alteration
of our code. Therefore we felt that this is a weakness that we would
have aimed to develop on to better our project findings, but also re-
alised that it would have been an unrealistic target. In addition, we
felt that having a way to stop people from navigating back through
their web browser would have benefitted our programme as it would
have prevented candidates altering their previous submitted answers.
We were aware that this could have been prevented by using some code
from JavaScript. However, we appreciate that given our constraints we
were not able to alter this weakness and any changes applied once the
test had been launched would affect the significance of our collected
data. Instead we asked users politely to only use our programme to
navigate through the test, where we did not provide an option to return
to previous scripts.
4. Statistical Restraints
Since a fundamental aspect of our project was both the collec-
tion and analysis of data, we were required to be rigorous statistically
in terms of our collection of this data. We spoke to an academic within
the Department of Statistics in the University of Exeter to seek advice
on things such as the types of questions we need and how many would
give us the most useful and, most importantly, significant data. We
originally strived to program a levelled system of 25 questions in total.
The test would be split up into 5 sections testing differing numerical
abilities, each section consisting of 5 questions. This meant that should
a candidate get a question wrong we would give them another question
of the same level, generated from question banks. However, if they got
69
this correct then we would give them a question from a bank which was
slightly more difficult. We would restrict this process to 5 questions for
each section. This would have enabled people to gain further practice
in areas in which they struggle and ensure people get a thorough un-
derstanding before progressing. After suggesting this to the academic
he advised strongly against doing this since levels make the statistical
analysis harder due to having lots of categories. He explained that
the level a candidate would get to under this system would determine
their score rather than comparing raw scores. It may affect our ability
to measure the role of other factors, like age, discipline and gender.
Therefore, we did not end up designing this system of questions as
it would have critically restricted both our ability to analyse the data
against our factors and our ability to trust that our findings were indeed
reliable and significant. Furthermore, we were aware that the people
who have taken the test probably would not be representative of our
aimed demographic. This was due to our team consisting primarily of
mathematics students as most of our connections occurred through our
degrees, making it difficult to ensure there was an equal spread of re-
spondents across disciplines. Further factors that also had effects were
gender, for example the female members in our group contacted more
females to take our test, a similar argument is available for male par-
ticipants. A final statistical weakness we were aware of was the lack of
effort and attention possibly paid by people who took the test. We ex-
pected that university students would have a lot of social engagements
and prior commitments so were aware that the effort would be unlikely
to be of the same standard applied when sitting a real job application.
To make it a fair test maybe we should have, after researching the time
for optimal mental performance, asked each candidate to sit the test
at this time. We could also have arranged set places to sit the test to
better replicate the type of focused conditions people use when sitting
real tests.
5. Norm Groups
Another evaluative point is that we did not use norm groups
when evaluating participants scores. We did not take into account that,
when applying for a job role, the raw scores are not taken into account
by employers but are instead compared to appropriate norm groups.
Employers would normally compare a candidate against a norm group
70
of similar demographics. The score of a candidate of a certain demo-
graphic taking a test would be compared against previous candidates
of that same demographic and not against everyone who took the test
in general. By not having used a norm group, our findings may not
show the same results as if we had used a norm group. However, con-
sidering our sample size it would not have been possible to do this as
there would not be enough data to create these groups.
5.1.3 Opportunities
1. More Adaptive
As explained in detail in the ’Programme Efficiency’ of the
’Weaknesses’ section we definitely see opportunities to develop our code
further. In particular we would wished to make our programme more
adaptive. Adaptive tests work by changing the questions a candidate
faces based on their performance. Motivated by this we could strive to
develop code that would be able to detect the slightest difference be-
tween two candidates ability and thus giving them alternative levelled
questions as a result. This would ensure the assessment was tailored for
more benefit to the candidate by practising questions that they strug-
gle with. There are many variables which a more advanced programme
could use to pick up on variations in ability, for example even if two
candidates scored the correct answer by recording the time taken to
answer the same question the programme could determine which can-
didate has the better ability and thus assign relevant questions to that
person. It should be noted that we did record the time it took candi-
dates to answer questions however this was done manually and purely
for purposes of data analysis and was not an automatic function on the
part of the programme. This is something we see an opportunity to
develop. Further variables including recording whether a participant
selects the wrong answer first but then changes it. Of course all of
these variables are subject to statistical scrutiny which we would en-
deavour to investigate via an appropriate academic of Statistics at the
University.
2. Extensions to Literacy
We see great potential in our assessment to be extended to
literacy as well as numeracy. This addition would be relatively simple
71
to employ since we have already written a basic format that the code
for the literacy would follow. Extending our test to literacy would be
a extra way in which to offer our candidates with a well rounded as-
sessment enabling them to gain comprehensive feedback from the two
key skills, numeracy and literacy, which are tested heavily in the appli-
cation processes of the working world. Furthermore we feel that there
is an opportunity here to undertake more advanced statistical research
asking questions such as, ’Do maths students achieve better in the nu-
meracy test than literacy in comparison to that of English students?’,
or, ’Do subjects that involve both an equal amount of literacy and nu-
merical skills such as Business studies achieve relatively equal scores in
both tests?’. There are many stereotypes about skills students in vari-
ous disciplines hold and so extending our test to literacy as well would
help us uncover whether these are true, and possibly find evidence of
new and unexpected trends, e.g. that English students would achieve
better in numerical tests on a whole than Maths students would.
3. Publication
As a team we are very proud of our project and do feel that
our statistical analysis and our findings currently contribute in some
ways to the world of academic literature in this area. Given more time
and opportunities to work further with both our project supervisor,
Dr Barrie Cooper and the the University of Exeter’s Statistics Depart-
ment we would be interested in pursuing this area of research further.
Thus allowing us to gain a more detailed and comprehensive study into
numerical tests, further extend our assessment to literacy, deepen our
analysis and potentially significance of our findings to the real world.
This would then allow us to create a paper ready for publication for an
appropriate academic journal.
4. Links with Career Zone to share our resource
The Career Zone forms an essential support network primar-
ily there to assist students in gaining the career to which they aspire.
Career Zone understands just how important a students ability to per-
form in numerical tests are in order to achieve the best internships,
placements, and jobs. Rowanna Smith is the lead Careers Consultant
for the College of Engineering, Mathematics and Physical Sciences and
she took great interest in the motivation of our research project. We
72
therefore see opportunity to take our new resource to the Careers Zone
team and work with them to potentially offer it through there website,
or internally, as a resource able to be accessed and used by students
seeking help with numerical reasoning tests. We see this as a very re-
warding way for us to benefit the university and its students. This is
not only indirectly through the wider implications of our findings, but
also directly to those students who participated in our studies.
5.1.4 Threats
1. Timing
Timing was an obstacle which we faced constantly through-
out the development of our project. Various team members had prior
commitments at times and so were unable to attend certain meetings.
This meant that everyone was not always aware of all progress being
made, thus causing confusion at times. In particular, on the direction
our project was taking and more specifically what tasks were required
to be carried out as a result. It should be noted, however, that with
due to various social platforms, anyone who was absent had various
methods of catching up and thus, staying up to date.
2. University Internet not working
Our ability to use, access and set live our numerical assess-
ment was heavily reliant on the efficiency of the University Internet.
The SQL Database which we used to store the data inputed via can-
didates sitting the test was kept on the university’s server along with
the final programming code. This meant that if the server shut down
due to a problem or for maintenance our programme would first and
foremost was unable to be accessed by students wishing to sit it and
secondly the data generated from that programme would be unable
to be downloaded off the servers ready for data analysis. We relied
on the fact that the University backed up their files through profes-
sional methods. One occasion which highlights most prominently the
detrimental effects which the university’s server failures had on us can
be seen when we first attempted to make our programme or website
go live. We were under a time pressure to get our website with the
assessment live, however, when we met with Dr Barrie Cooper to set
it up the university server had shut down due to an error which the
73
university was looking into. This meant we had to call off the meeting
which meant setting back the date on which our programme went live
thus, meaning we were left with less time to both get people to take
the test and then analyse the data.
3. Programme Failure
Those experienced with programming were aware that the
most time consuming parts of creating a working programme is in de-
bugging it. Debugging is a term used in Computer Science to mean the
identifying and removal of errors from computer hardware or software
[16] [17], in our case it was software. For the majority of the develop-
ment of our programme we encountered countless errors. These formed
big obstacles which were critical to the success of our project, as failure
to resolve them meant that we would have no method of collecting data.
An example of a big obstacle we faced in our programming was ensuring
the correct variables were POST through the PHP code which ensured
we were able to make our assessment randomised. Furthermore, just
before our assessment was able to become live we had to work to debug
errors in the code which were not printing correct characters such as
pound signs or apostrophes and we further had to resolve issues with
data being successfully inputted into the database. Dr Barrie Cooper
was great with helping to resolve these programme failures.
4. Student Participation
A big obstacle which we anticipated would arise and which
resultantly did, was getting enough students to participate in our re-
search by undertaking our assessment. We found it a great challenge to
get enough people to sit our test. The participants still did not always
follow the protocol which we provided, such as navigating back via
their browser, using a search engine to look for advice on how to solve
questions. As a group we see that one way in which we would have
been able to encourage more students to do the test more rigorously to
the protocol set out would be to offer some kind of incentive.
5.2 Improvements
The first improvement we could make on our project is to make
an adaptive test where the candidate progresses through levels of ’difficulty’
74
depending on their performance in the test. Our initial plan was to create a
test where, based on the result of the previous question whether it was an-
swered correctly or not, the question given next is altered. If the answer was
correct, the candidate would move onto the next level of difficulty, however,
if they got it wrong they would complete another question at the same level
of difficulty. Therefore, two people taking the test could end up having taken
very different questions. After speaking to Dr. Ben Youngman, we found
out that this would really affect the statistical analysis we had planned to
perform. The level of difficulty of the question affects the score more than
any other factor. This is why instead, we separated the test into 3 levels,
KS2, KS3 and GCSE, and each candidate took the same questions. To im-
prove the project, we could have done more research into this problem and
tried to find a way of analysing our data with the adaptive test.
Our set questions were made random through randomising values
within each question. We were unable to randomise the numbers in the pie
charts because in the time limit we had, we could not work out how this
would be done in our PHP code. An improvement on our numerical rea-
soning assessment is to randomise these. In addition, we could have used a
larger variety of charts, for example a line graph or a bar chart.
A further benefit to our project would have been an expansion of
our sample set, meaning having more participants taking our assessment.
This would have increased our population size and further represented the
whole population.
A final improvement would have been to have given feedback in the
form that aids the type of learner participating in the test. For example,
an auditory learner would have benefited from an audio recording explaining
how to answer the question. A kinaesthetic learner would have benefited
from interactive and physical feedback rather than the written step by step
solution of the previous question. Our written feedback was suited mostly to
visual learners.
5.3 Further Research
The first way we could further our research is to look into potential
’cheating’. Our numerical reasoning test was not under exam conditions so
it was not possible to prevent participants from cheating. However, this is
the case with most numerical reasoning tests used by employers so may be
unnecessary to mitigate for. The tests also do not allow you to know whether
75
a participant calculated the answer or just made a ’lucky guess’. If you guess
a multiple choice question with 5 possibilities, there is a 20% chance a par-
ticipant will guess the question correctly. We could further our study of the
data by creating a criterion to detect cheating and perform statistical anal-
ysis. Could you actually read, understand and answer a question in under
4 seconds? We would need to put forward, and answer the question, ’What
is the minimum number of seconds in response time before you flag them
as cheating?’ With additional research we could assess whether this could
be modelled, for example, through Poisson distribution (Poisson distribution
models random events that occur at a constant rate).
As we did not have anybody take our test more than once, we were
unable to evaluate if practicing numerical reasoning tests can improve scores.
Therefore, we could further our research by getting participants to retake our
test and determine whether their score improved.
Although, we did time how long it took participants to take our
numerical reasoning test, there was no time limit and we did not show the
participants their overall time. In most job application online tests there is
a time limit. Moreover, the time is usually shown in the corner of the web-
page. We could further our research to see whether this affects peoples scores
or if participants get nervous with a time limit causing them to panic and
hence perform worse. We could also perform further research to determine
whether the time pressure improves test performance due to increased con-
centration. To investigate this we could ask people take the test with a time
limit and compare the statistics achieved on this test with our original. This
could consist of calculating whether the average score or standard deviation
changed.
76
6 Bibliography
References
[1] Numeracy in practice:teaching learning and using mathematics. Cur-
riculum and Leadership Journal, 7(Issue 28):5, 2009.
[2] May 2013.
[3] Rossbach H. Weinert S. Ebert S. Kuger S. Lehrl S. von Maurice J. An-
ders, Y. Home and preschool learning environments and their relations
to the development of early numeracy skills. Early Childhood Reseacrh
Quarterly, 2011.
[4] P. Bell. Why and how to use psychometric testing. recruiter., August
2015.
[5] Smith H. Braper, N. Applied Regression Analysis Third Edition. Wiley-
Interscience Publication.
[6] Parsons S. Bynner, J. Use It or Lose It. The Impact of Time out of
Work on Literacy and Numeracy Skills. Basic Skills Agency, 1998.
[7] Parsons S. Bynner, J. Qualifications, basic skills and accelerating social
exclusion. Journal of Education and Work, 2001.
[8] Products CEB, SHL Talent Measurement. Assessment types.
[9] SHL Talent Measurement CEB. Aptitude. identify the best talent faster
and at less cost.
[10] S. Cook. Kinesthetic learning styles: 24 activities for teaching.
[11] S. Coughlan. University applications hit record high.
[12] B. Dattner. How to use psychometric testing in hiring. harvard business
review, human resource management.
[13] Michelle E. Davis. Learning PHP and MySQL. O’Reilly, 2006.
[14] New York Univeristy Department of Mathematics. Why study mathe-
matics.
77
[15] The University of Arizona Department of Mathematics. Why study
mathematics.
[16] Oxford Online Dictionary. Definition of computer hardware.
[17] Oxford Online Dictionary. Definition of ’software’.
[18] Oxford Online Dictionary. Definition of mathematics, November 2015.
[19] Oxford Online Dictionary. Definition of numeracy, November 2015.
[20] Boyle M. Williams B. Fairhall R. Eastwood, K. Numeracy skills of
nursing students. Nurse Education Today, 2010.
[21] J. Faraway. Practical regression and anova using r. July 2002.
[22] Department for Education. Mathematics gcse subject content and as-
sessment objectives. Online PDF File, 2013.
[23] Department for Education. Mathematics programmes of study, key stage
3 national curriculum in england. Online PDF File, September 2013.
[24] Department for Education. Mathematics programmes of study, key
stages 1 and 2 national curriculum in england. Online PDF File, Septem-
ber 2013.
[25] The Department for Education. The national curriculum in england,
key stages 3 and 4 framework document, December 2014.
[26] R. Garner. Ey. firm says it will not longer consider degrees or a level
results when assessing employees. the independent.
[27] The Guardian. Gcse results day 2015: pass rates rise as uk students find
out grades – as it happened.
[28] The Guardian. Maths teaching revolution needed, November 2013.
[29] D. Hallet. How is numeracy different from elementary mathematics?
University of Arizona, Harvard University, November 2014.
[30] K. Herbert. Bias in personnel selection and occupational assessments:
Theory and techniques for identifying and solving bias. International
Journal of Psychology and Counselling, 5(3):38–44, 2013.
78
[31] Specialist in Graduate Careers Inside Careers. Assessment centres, nu-
merical reasoning tests.
[32] Zuckerberg M. Interview, Facebook Founder. Facebook has a billion
users in a single day, says mark zuckerberg. BBC, August 2015.
[33] A. Jenkins. Companies use of psychometric testing and the changing
demand for skills. A review of the literature. Centre for the Economics
of Education, London School of Economics and Political Science, 2001.
[34] S. Jordan. Assessment for learning: pushing the boundaries of com-
puter based assessment. Assessment in Higher Educaton Conference,
Cumbria, Centre for Open Learning of Mathematics, Computing, Sci-
ence and Technology (COLMSCT), The Open Univeristy, pages 1–12,
July 2008.
[35] K. Lepi. The 7 styles of learning, which works for you. November 2012.
[36] R. Lowry. Chemistry education research and practice, computer aided
self assessment, an effective tool. The Royal Society of Chemistry (RSC),
6:198 – 203, July 2005.
[37] Assessment Day Ltd. Numerical reasoning test.
[38] Metcalf H. Meadows, P. Does literacy and numeracy training for adults
increase employment and employability. evidence from the skills for life
programme in england. Industrial Relations Journal, 39(5):354 – 369,
September 2008.
[39] Watt H. Mogey, N. Implementing learning technology - the use of com-
puters in the assessment of student learning. Learning Technology Dis-
semination Initiative, pages 50 – 57, 1996.
[40] C. Neuhauser. Learning style and effectiveness of online and face-to-
face instruction. American Journal of Distance Education, 16(2):99–113,
2002.
[41] National Numeracy. Why is numeracy important.
[42] J. O’Donoghue. Numeracy and mathematics. Department of Mathe-
matics and Statistics, University of Limerick, Ireland, Irish Math. Soc,
(Bulletin) 48:47–55, 2002.
79
[43] Better Policies for better lives. OECD. Education report.
[44] Test Partnership. Numerical reasoning (n-ara).
[45] Prospects.
[46] P. Robinson. Literacy, numeracy and economic performance. New Po-
litical Economy, 2007.
[47] C. Rowlands. How organisations get the best out of psychometric test-
ing. personnel today in association with network hr. 2015.
[48] Learning Rx. Types of learning styles. 2015.
[49] Telegraph Staff. Employers receive 39 applications for every graduate
job.
[50] Kinesthetic Learning Strategies. What are the best kinesthetic learning
strategies.
[51] R. Stretton. Talent and capability consultant at rsa insurance. Email
Conversation, October 2015.
[52] ESL Kids Stuff.
[53] BBC Recruitment Team. Recruitment. Email Response.
[54] tes. Algebra - levelled sats questions, July 2014.
[55] Mind Tools. Swot analysis.
[56] A. Tucker. What is important in school mathematics. Technical re-
port, Department of Applied Mathematics and Statistics, Stony Brook
University, 2015.
[57] Indiana University. Academic enrichment. 2015.
[58] R. Vosburgh. The evolution of hr. developing hr as an internal consulting
organization. Human Resource Planning, 30(3):11, 2007.
[59] w3schools.com. Php 5 tutorial.
[60] Head of the Mathematics Department St Teilo’s Cw High School Cardiff
Wylie, G. Information on potential mathematics gcse reforms. Verbal
Conversation, June 2015.
80
7 Appendix
The following Appendix is a compilation of any extra files that we believe are
not relevant to this report but may still be both interesting and useful for the
reader.
Our Survey comprises of the following series of images.
Figure 29: Survey Pages.
81
Figure 30: Survey Pages.
Figure 31: Survey Pages.
82
Figure 32: Survey Pages.
Figure 33: Survey Pages.
83
Figure 34: Survey Pages.
Figure 35: Survey Pages.
84
Figure 36: Survey Pages.
85
Figure 37: Survey Pages.
Figure 38: Survey Pages.
Our numerical reasoning assessment comprises of the following images.
86
Figure 39: Opening page of our Numerical Reasoning Assessment.
87
Figure 40: Participant Information Page for our Numerical Reasoning As-
sessment. 88
Figure 41: Question 1.
89
Figure 42: Question 1 Feedback.
90
Figure 43: Question 2.
91
Figure 44: Question 2 Feedback.
92
Figure 45: Question 3.
93
Figure 46: Question 3 Feedback.
94
Figure 47: Question 4.
95
Figure 48: Question 4 Feedback.
96
Figure 49: Question 5.
97
Figure 50: Question 5 Feedback.
98
Figure 51: Question 6.
99
Figure 52: Question 6 Feedback.
100
Figure 53: Question 7.
101
Figure 54: Question 7 Feedback.
102
Figure 55: Question 8.
103
Figure 56: Question 8 Feedback.
104
Figure 57: Question 9.
105
Figure 58: Question 9 Feedback.
106
Figure 59: Question 10.
107
Figure 60: Question 10 Feedback.
108
Figure 61: Question 11.
109
Figure 62: Question 11 Feedback.
110
Figure 63: Question 12.
111
Figure 64: Question 12 Feedback.
112
Figure 65: Question 13.
113
Figure 66: Question 13 Feedback.
114
Figure 67: Question 14.
115
Figure 68: Question 14 Feedback.
116
Figure 69: Question 15.
117
Figure 70: Question 15 Feedback.
118
Figure 71: Question 16.
119
Figure 72: Question 16 Feedback.
120
Figure 73: Question 17.
121
Figure 74: Question 17 Feedback.
122
Figure 75: Survey 18.
123
Figure 76: Question 18 Feedback.
124
Figure 77: Question 19.
125
Figure 78: Question 19 Feedback.
126
Figure 79: Question 20.
127
Figure 80: Question 20 Feedback.
128
Figure 81: Closing page of our Numerical Reasoning Assessment.
129
The following series of images comprise of screen shots of R code which was
used for our statistical modelling within this report.
130
Figure 82: Screen Print of R code used for statistical analysis.
131
Figure 83: Screen Print of R code used for statistical analysis.
132
Figure 84: Screen print of R code used for statistical analysis.
133
Figure 85: ANOVA for initial model.
134
Figure 86: ANOVA for model created by stepwise regression.
135

final

  • 1.
    University of Exeter Collegeof Engineering, Mathematics and Physical Sciences ECM3735 Mathematics Group Project Computer Assessment - The Challenges and Potential Solutions Authors: Candidate Numbers 003440, 035429, 006702, 000997, 019169, 008339, 011812, 006667. Advisor: Dr. Barrie COOPER
  • 2.
    College of Engineering,Mathematics and Physical Sciences Harrison Building Streatham Campus University of Exeter North Park Road Exeter UK EX4 4QF Tel: +44 (0)1392 723628 Fax: +44 (0)1392 217965 Email: emps@exeter.ac.uk December 7, 2015
  • 3.
    Abstract The purpose ofthis report is to explore the challenges and potential solutions of current computer-based assessments. With increasing numbers of applications for graduate jobs, there exists a growing pressure among applicants to succeed at online assessments set by employers. The vast number of applications received, compared to available positions, puts an even greater need for employers to de- velop effective and fair assessments. These can then identify the most appropriate candidates who are able to best demonstrate their abilities in numerical reasoning, which have been shown to be a reliable predic- tor of job performance. In our report we approach the four questions: How do people learn through computer-based assessment? Why is it important to study mathematics? The Numeracy Vs. Mathematics debate. Why do certain employers use numerical testing? Are certain types of learners better at numerical reasoning tests? By creating our own numerical reasoning test, we hoped to explore the factors that affect participant’s performance. The team carried out extensive sta- tistical analysis hoping to relate our findings back to our hypotheses. We found significant findings for all four of our proposed hypotheses. The overall findings of this report demonstrate that current numerical reasoning assessments and practice tests are potentially flawed. Our findings suggest that they fail to accommodate to all types of learners, and in most cases fail in providing comprehensive feedback. From our research and test findings we encourage companies and educational in- stitutions to take on board our recommendations, such as to improve both the feedback and preparation they offer to candidates.
  • 4.
    Contents 1 Introduction 3 1.1Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Preliminary Findings . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 The ’Mathematics Vs Numeracy’ debate. . . . . . . . . 5 1.2.2 Why is mathematics important? . . . . . . . . . . . . . 7 1.2.3 Do people forget mathematics skills as they get older? 8 1.2.4 Why do certain employers use numerical reasoning as- sessments? What skills do they think it will show? . . 9 1.2.5 How do people learn through computer-based assess- ment? What works and what does not? . . . . . . . . . 14 1.2.6 Will different types of learners (kinaesthetic, visual etc.) have different levels of numeracy? . . . . . . . . . 15 2 Methodology 16 2.1 Group Organisation . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 Meeting Times . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 Communication . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 Combatting Risk . . . . . . . . . . . . . . . . . . . . . 18 2.1.4 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 Preliminary data collection . . . . . . . . . . . . . . . . 20 2.2.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Test design, creation and analysis . . . . . . . . . . . . . . . . 23 2.3.1 Producing the Questions . . . . . . . . . . . . . . . . . 23 2.3.2 Programming the Test . . . . . . . . . . . . . . . . . . 25 2.3.3 Test distribution . . . . . . . . . . . . . . . . . . . . . 31 2.3.4 Test Analysis . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Report Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1 Skill development - Graduate Skills . . . . . . . . . . . 36 3 Findings 38 3.1 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 1
  • 5.
    3.2.1 The MathsVs. Numeracy Debate. Why is mathemat- ics important? . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Why do employers use numerical reasoning testing? . . 50 3.2.3 Do different learners perform better on numerical rea- soning tests? . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.4 How do people learn through computer-based assess- ments? . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.5 Regression Modelling . . . . . . . . . . . . . . . . . . . 56 3.3 Feedback Findings . . . . . . . . . . . . . . . . . . . . . . . . 61 4 Conclusion 63 5 Evaluation 66 5.1 SWOT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.3 Opportunities . . . . . . . . . . . . . . . . . . . . . . . 71 5.1.4 Threats . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . 75 6 Bibliography 77 7 Appendix 81 2
  • 6.
    1 Introduction In recenthistory our world has bared witness to some of the most revolutionary and exciting technological advances of all time. We live in an age where computers seem to hold a role in society on par with that of the basic necessities such as food or water. Our planet no longer revolves merely around the sun, but around all things computer related. These advancements have caused a great evolution in human society; we have gone from a very much physical world to a more paperless and virtual one. This is even true for assessments. Nowadays companies require candidates to be assessed via online tests as opposed to the traditional paper and pen exam. This report endeavoured to explore such computer-based assessments. In particular, we have looked at numeracy tests exploring both the challenges they face and the potential solutions. 1.1 Aims and Objectives 1.1.1 Aims Computer-based assessments are widely used by employers, govern- ment departments and educational organisations, the list nowadays is endless. The question is why? First and foremost, we have examined the aptitude of a candidate in a particular subject. Two common subjects that have been examined via online assessments are numeracy and literacy, which are two key skills for employability, the applied skills of the subjects - Mathematics and English respectively. There is a definite need for candidates to both learn and improve from these tests. Motivated by this fundamental necessity we have focused on creating our own form of computer-based assessment as part of our project. We have used this as a vehicle with which to answer several questions that target the way in which a person learns. This has been done by analysing data from a sample of people who had taken our test. We have carried out research around this subject area of computer-based assessments and have used our data to compare our findings with the current literature available and thus provided insight into online testing that can be useful to both universities and employers. 3
  • 7.
    1.1.2 Objectives Our objectiveswere clear. Firstly, we have researched extensively into literature relating to education and current learning methods. This lit- erature has enabled us to understand the theory behind the ways in which people learn and improve. Our research on this was both general and specific to computer-based assessments. Furthermore, we have explored the types of learners that exist. Once these were clearly classified we were able to explore effective learning methods catered specifically to them. This acted as an im- portant step since the aim had been to create an assessment which provided effective and comprehensive feedback. It had been planned to develop a computerised numerical reasoning test with which we gathered the necessary data, in order to support or con- tradict the literature and current research hypotheses that exist within the community. We had planned to incorporate an adaptive feature into our as- sessment which has been crucial in equipping our assessment with the ability to tailor questions to an individual’s ability based on their performance in the previous questions. This has not only aided our statistical analysis but has also provided participants with the relevant practice and training that they require. It has also been planned to provide real time feedback which will allow participants to instantly identify where they went wrong and more importantly how they were able to correct this. The test was primarily a numerical reasoning test and so incorpo- rated numerical skills similar to those tested in graduate job applications. We therefore felt it would be beneficial to research the question “Why test mathematics?” These findings shed light on exactly why employers incorpo- rate such assessments into their application process and what they hope to discover through doing so. Age is a factor which uniquely defines a person. In our project, we have explored whether there is evidence to support the idea that mathemat- ics skills can deteriorate if you stop using them. Moreover, the idea that mathematics skills are not permanent and require regular revision in order to remain within one’s memory. This period of time could be in the years post GCSE or A Level. We have been able to examine whether people who study mathematics have a greater advantage compared to others. All of these findings have been compared against the current literature available and thus after analysis of the results of our test, we have been able to highlight any supporting or contradicting trends. 4
  • 8.
    There is awide spread controversial debate, not only across the aca- demic community, but also across the world as to whether mathematics and numeracy are essentially the same thing. It poses the question of whether nu- meracy skills essentially rely on mathematical skills and vice versa, whether they are in fact completely separate disciplines. We felt this is a relevant area to explore since it there has been talk that the UK Government has plans to change the current mathematics GCSE by removing numeracy from mathematics and treating them as independent subjects as mentioned above. We hope to discover whether mathematics students actually have an advan- tage in numeracy tests, given that this is a skill that is not relevant to, nor practiced at degree level. We have modelled the data that we had collected from our test re- sults, with relevant graphs, and have further analysed it using appropriate statistical techniques. Modelling our data in appropriate mediums has en- abled us to efficiently compare our results to those found in the literature. Finally, we have aimed to assess how useful our findings are relative to our broader stakeholders. We have set out to measure to what extent, if any, we had been able to contribute to the current problem of learning using computer-based assessments. It has been an objective of our’s to both highlight the problems with what is currently available and possibly improve it, by providing solutions using our findings. We have planned to approach professionals and experts in this field with our results in order to get reliable feedback. 1.2 Preliminary Findings 1.2.1 The ’Mathematics Vs Numeracy’ debate. An ever greater need for both mathematical and numerical skills is constantly emerging. However, there is a big debate amongst society as to whether mathematics and numeracy should be considered to be the same thing, or whether numeracy should be a subject in its own right. There are current plans for GCSE Mathematics in the UK to split the subject into two separate, independent GCSE’s: Mathematics and Numeracy [60]. Mathematics is defined by Oxford Dictionary as “the abstract sci- ence of number, quantity and space, either as abstract concepts (pure math- ematics), or as applied to other disciplines such as physics and engineering (applied mathematics)” [18]. Moreover, numeracy is defined by Oxford Dic- 5
  • 9.
    tionary as “theability to understand and work with numbers” [19]. It may be concluded from these two definitions that numeracy is a subset of mathe- matics. However, it can also be argued that numeracy is a subject in its own right and should be separated from mathematics as it is more applicable in society and workplace. Interestingly, a paper on Numeracy and Mathematics from the Uni- versity of Limerick, Ireland, contained no universally accepted definition of numeracy [42]. This is backed up by research from the University of Arizona which found that the difference between numeracy and elementary mathe- matics is analogous to the difference between quantitative literacy and math- ematical literacy. [29] More importantly, no universal definition for numeracy was agreed upon, although there was much overlap between current working definitions. The most important difference between the two forms of literacy is that quantitative literacy puts more emphasis on context, whilst mathe- matical literacy focuses on abstraction [29]. In a paper produced by the University of Stony Brook’s Applied Mathematics and Statistics Department, it is stated that all mathematics in- struction should be devoted to developing “deeper mastery” of core topics, through computation, problem-solving and logical reasoning - which is effec- tively what a numerical reasoning test examines. Simple proportion problems can be integrated into fraction calculations early on. In addition, the devel- opment of arithmetic skills in working with integers, fractions and decimals, should be matched with increasingly challenging applied problems, many in the context of measurement. Solving problems in different ways ought to be an important aspect of mathematical reasoning with both arithmetic and applied problems in order to ensure a sufficient level of numerical skills for further progression in society [56]. The Guardian Newspaper produced an article exploring a world- wide problem associated with the difference between mathematics in edu- cation and mathematics in the real world. The article states that all over the world we are mostly teaching the wrong type mathematics [28]. The Guardian then went on to describe how computers are primarily used for calculation purposes, despite the fact that we tend to train people for this use. This is true almost universally [28]. We are able to relate this to the context of the ’Mathematics Vs Numeracy’ debate, since generally, the math- ematics taught in education is too pure and distant from the real world, while on the whole the mathematics used in the everyday life is numeracy. Many companies require potential employees to sit a numeracy test 6
  • 10.
    before commencing employment,despite the fact they already hold nation- ally recognised qualifications in mathematics. An article from an electronic journal for leaders in education explores this. Findings show that although the term “numeracy” is not widely used across the world, there does ex- ist a strong consensus that all young people are required to become more competent and confident in using the mathematics they have been taught. Furthermore, numeracy is found to bridge the gap between school-learned mathematics and its applications in everyday life [1]. These findings sup- port companies in their efforts to use numerical reasoning testing as a way of seeing whether candidates can efficiently use their formal qualifications in a practical environment. A candidate may have achieved high results in their school exams, but this does not necessarily mean that they will be able to use the gained qualifications for practical problem solving which is recognised as the main use of mathematics [28]. An insufficient level of numeracy skills has been found to lead to unemployment, low wages and poor health furthermore highlighting the im- portance of numeracy [43]. The need for mathematics exists in all aspects of everyday life within the workplace and in other practical activities such as schools, hospitals, news and understanding any statistics [14]. 1.2.2 Why is mathematics important? The study of mathematics can lead to a variety of professional ca- reers such as research, engineering, finance, business and government services [14]. This is supported by the University of Arizona, Department of Mathe- matic’s who also added social sciences to the above fields [15]. It should be noted, however, that these careers are fundamental for the world’s economy. Therefore, it is important to ensure that people working within those fields have sufficient skills to ensure correct and efficient problem solving and pre- vent any detrimental consequences. Finally, it has been suggested that poor numeracy leads to depres- sion, low confidence and low self-esteem, leading to social, emotional and behavioural difficulties increasing social exclusion, truancy and crime rates [41]. With the digital age, 90% of new graduate jobs require a high level of digital skills which are built on numeracy. Although computers are able to reduce the need for human interaction in certain calculations, sufficient numeracy skills are required to enable efficient use [41]. 7
  • 11.
    1.2.3 Do peopleforget mathematics skills as they get older? Research has found that a severe loss of both numeracy and lit- eracy skills often occur in adulthood, with 20% of adults experiencing dif- ficulties with basic skills needed to function in the modern day society [6] [20] [38]. Simple numerical calculations, such as percentages and powers, are found difficult despite being taught and tested to the government’s standard throughout education. The effect of being unemployed has been explored for both men and women and it has been found that numeracy skills get steadily worse the longer a person is without a job [6]. Interestingly, women experience a lesser effect than men due to their role in society being more diverse, hence requiring them to use their numeracy and literacy skills more frequently. It has also been found that the loss of skill largely depends on the starting level of knowledge and understanding, and that those who have poor skills to begin with experience a more severe deterioration. Furthermore, numeracy skills have a smaller presence in everyday life as more people find themselves reading than they do performing calculations. However, a decrease in liter- acy skills leads to an even further loss of numeracy skills as it increases the difficulty of understanding the posed question. Important findings have been made amongst a group of nursing stu- dents, who were asked to sit a numeracy test containing questions similar to those that they will have to answer as part of their future job [20]. The aver- age score was 56% with the most common types of errors being arithmetic. There is expected to be a significant difference in results between students with those who took a year out before commencing higher education. Those who started immediately score on average 70%, while those who didn’t aver- age only 47%. This shows that being in an environment that doesn’t require the use of numeracy skills, has a deteriorating effect, not only on the ability to perform simple calculations but also that of being able to extract relevant information to set up an equation. This means that even with the use of a calculator, these students are still likely to make mistakes. Students have also been found unable to identify errors in their work, even when the result found is unreasonable and unrealistic. Such results are potentially danger- ous, for example, nursing students must perform calculations such as drug dosage, which if incorrect will cost both the public in their suffering, and the employer in having to provide additional training. Due to the ever increasing importance of skills in the world of work, 8
  • 12.
    especially early onin a career, a lack of numerical competence has an un- desirable effect on employment of these individuals, which in turn affects their standard of living [6] [46] [38]. Such requirements are brought about by the recent changes to the labour market, with less semi- or unskilled manual jobs available due to technological developments [46]. Unskilled workers have difficulty in both gaining and retaining employment, and so are the first to suffer in the case of downsizing or a crisis [6]. A low level skill set also limits individuals to lower and middle range jobs (bottom 10% to 20%), preventing them from experiencing career growth, and leading to severe social exclusion [46] [6] [7]. This causes a downward cycle as low skill level is passed on from parents to children, therefore, accelerating unemployment through the gen- erations [7]. The government has recognised this problem and created a ’Skills for Life’ programme, which aims to provide basic skills to adults in order to help them gain employment [38]. Other solutions include on-the-job train- ing, or, as research suggests, we can even prevent such severe skill loss by ensuring pupils reach a certain skill level whilst still in education [6]. There are, of course, other factors which lead to a low level of nu- meracy skills, such as family background, learning environment and quality of education [6] [3]. However, in this report we concentrate on how the low level of demand for numeracy in everyday life affects a student’s performance in an online test. 1.2.4 Why do certain employers use numerical reasoning assess- ments? What skills do they think it will show? In addition to this, with a constantly changing and advancing busi- ness world, the way in which people are hired may be a natural result of shifts in the business environment and modern workforces. A number of studies mentioned in A. Jenkin’s, 2001 paper speculated that the increase in numerical tests is due to the greater professionalism of the human resource sector of many businesses, as well as the inclusion of standard selection pro- cedures in their business [33]. In the 21st century, Human Resources (HR) has evolved massively, and is now an integral part of most organisations [58]. All these factors may have led to the rise of assessment centres, due to a con- tinuous desire amongst companies to gain a professional edge. They do this by searching for alternatives to traditional methods of employment, much of which is done through HR. This greater reliance on HR as a business sector 9
  • 13.
    has led tothe employment of much stronger recruitment methods, which (for reasons that will follow) enable them to meet legislation requirements, and promote a fair practice. In many work forces, it has been clear in recent years that employ- ability tests have been used for means other than just performance testing. It has enabled a platform that assesses based on merit rather than personal criteria reducing the impact of discriminatory practices [4]. Due to equal opportunity legislation in many countries, which is most commonly related to the differing proportions of ethnic groups hired, many employers could be vulnerable to prosecution [58]. These types of random psychometric tests can therefore be used as a way to reduce bias and discrimination [33]. One factor in explaining the increase in the use of these tests may therefore be as a prudential response to changes in hiring attitudes and legislature. On the other hand, the opposite has also been said - that companies need to keep legal compliance in mind when they use psychometric tests [12] so as not to offend candidates by using irrelevant tests. In addition, when using these tests, the role of bias has been explored as many psychologists and compa- nies note that testing is an intrinsically culturally biased procedure that can cause discrimination against ethnic minorities. This is as a result of cultural differences leading to consistently different answers across several different social groups. Although it can be noted that this applies more specifically to judgement and situational tests, and not to numerical and verbal listening tests that we are planning to test our research on [30]. The rise of these tests could also be attributed to the workplace’s lower regards for formal qualifications as a method of streaming candidates and predicting their future abilities [33]. This may be because young labour force entrants across the EU have much higher attainments than they previ- ously did, and hence it is harder to sort applicants out at the top end of the spectrum based on attainment than in the past. This may lead employers to screen applicants much more carefully [33]. Potentially this was caused by the previous decade of education being hailed as ’too easy’ [27], which caused achievements to be very high. Periods like this can have knock on effects on recruitment methods, as a reaction to these ’more qualified’ applicants fil- tering through the recruitment system and into the business environment. However, this may be subject to change given that recent education reforms claiming to ’toughen’ up the curriculum, have yet to see their full effect - particularly in terms of employment. Examples of a lack of belief in the edu- cation system can be seen by the movements of top employers. An example of 10
  • 14.
    this would beone of the ’big four’ professional services firms, Ernst & Young [26], who have recently changed their application rules so that educational attainments, such as degree class, are no longer taken into account. Instead they believe that their own in-house testing and assessment centres are a reliable enough indicator of whether candidates will succeed [26]. Another example of this was with the introduction of the Army’s own mathematics test for applicants. The reason for its development was the increasingly chal- lenging task of using GCSE mathematics results as a discriminator amongst applicants for technician roles [33]. If formal qualifications continue to be an insufficient indicator of applicant’s abilities, then companies will have to find new methods to screen them, as is happening already with the increase in psychometric testing. When beginning our research, we went down many different routes to get a broad range of information. Through emails and other means of correspondence, we identified a few problems that employers encounter with these psychometric tests. Firstly, they are not always sat in test centres, and many are done online. This always leaves the possibility that people may try to cheat on these tests and get other people to sit them on their behalf [4]. This is unfair on other candidates, as well as misrepresentative, causing people who may not be suited for a role to progress further in applications than they otherwise would. Having said this, most of these tests have been designed in such a way that they are fairly difficult to cheat on - for instance having time restraints [53]. We have also found that these tests are mostly used as a means of filtering candidates, so passing them doesn’t necessar- ily guarantee any further success. Secondly, some companies have said that tests may potentially be unrepresentative since people only get one chance to take them[53]. Due to many different circumstances, an employee may well underperform on the test, and so not demonstrate their full potential. This could cause companies to miss out on hiring perfectly well-suited candidates, in which case the tests would be causing a misallocation of their resources. Some companies have a validation test in place that allows people who got unexpected results to retake the test. However, obviously not all companies will guard against inconsistencies in this way [53]. On the contrary, many recruiters we spoke to stated that these tests and their scores are used only to assist in the recruitment process, and are not the sole factor for employ- ing people [51]. Instead they are used as a guidance to help make informed decisions on applicants, so a well rounded application is essential in addition to these tests [4]. 11
  • 15.
    “Numerical reasoning isthe ability to understand, interpret and logically evaluate numerical information. Numerical reasoning is a major facet of general cognitive ability, the strongest overall predictor of job per- formance” [44]. Due to the numerical reasoning skills you exhibit when you take a numerical reasoning assessment, they are seen to be the ’best overall predic- tor of job performance’ [44]. Both numerical and verbal reasoning tests are combined to be an overall aptitude assessment that highlights the most well rounded, suited people for the job. Aptitude tests show employers skills that cannot be replicated in interviews, nor be observed by reading CVs and look- ing at past references. They are a true, accurate and quick assessment of how candidates perform on the spot in a pressured environment. The ’govern- ment mathematical report’ [25] alongside Careers’ websites, such as Assess- ment Day Ltd [37] and Inside Careers [31], agree that the only mathemati- cal abilities being tested on numerical assessments are addition/subtraction, multiplication, percentages, currency conversions, fractions and ratios. In addition, they are testing the ability to “interpret the tables and graphs correctly in order to find the right numbers to work with” [31]. Numerical reasoning tests are normally timed, in order to measure applicants’ ability to think on their feet and problem solve under time pressure. Prospects [45], a website designed to help people looking for jobs, stated that employers in most industries are looking for applicants with plan- ning and research skills, i.e. those applicants with the ability to find relevant information from a variety of different sources. Information can be presented in a variety of ways, such as with numbers, statistics or text in tables, graphs and reports. Employees need to be able to understand, analyse and interpret research and appropriately use it. Numerical assessments are testing these exact skills. In addition, tests can have varied levels of difficulty, to represent the levels of numerical skill that will be needed for the specific job. SHL Talent Measurement Assessments create a wide range of tests, ranging from aptitude and personality tests, to customised tests for individual companies [8]. They create a variety of tests appropriate for different job levels and industries. Numerical reasoning tests can be adapted to have more complex questions, requiring a more advanced level of numerical knowledge and skill. Another way of making them more challenging is to shorten the time avail- able to complete the test. SHL have quoted that their tests represent the ’level of work and academic experience’ [8] required for a specific job role. 12
  • 16.
    For example, SHLreleased an ’Aptitude: Identify the Best Talent Faster and at Less Cost’ brochure [9] stating that a Semi-Skilled staff job will require a VERIFY Calculation Test, where as a Director or Senior managerial job will need to be tested using the VERIFY Numerical Test, which is far more advanced. Furthermore, as numerical reasoning is just one aspect of an apti- tude assessment it means applicants applying for highly numerical jobs may also get asked to take a verbal reasoning test. In all jobs, an ability to com- municate with colleagues is essential. This reiterates the fact that aptitude tests are used to find the overall highest-calibre applicant. A job application process is not a simple task. For many job appli- cations candidates must spend hours researching the company, before writing the application form and preparing for interview. Practising the skills ex- amined in numerical tests is just another aspect of a job application that requires preparation. Does an applicant’s mark improve with practice? If so, then applicants can practise in order to achieve high results, no matter what degree they study or how long it’s been since they last studied mathemat- ics. For example, even an applicant that stopped studying mathematics at GCSE level can use the numerous online resources available to practice and prepare for numerical tests, and hence could easily ’revise’ for such a test and potentially perform very well. The overall consensus from our sources, is that numerical tests used by large companies (especially those with large numbers applicants), are gen- erally a candidate streaming process. With UK education standards rising and a larger number of students receiving higher education (In January 2015, 592,000 people had applied to University, up 2% from the year before) [11], more people are eligible to apply for graduate scheme jobs. High Fliers Re- search presented their findings in a report, ’The Graduate Market in 2014’ the Telegraph [49] and stated that graduate schemes now receive approxi- mately 39 applications to every available job. With the number of students applying to such schemes high and rising, it is extremely hard to differenti- ate between candidates who have all achieved high grades and well regarded university degrees. How do you select the ’best’ candidate from thousands of similar applications? Due to this difficulty, companies use these to reduce the number of applicants they consider in the next application step. Accord- ing to Personnel Today [47], 80% of companies use standard off-the-counter numerical tests provided by companies such as SHL. Only 18% use a test which they have tailored to measure unique, customised skills that they are 13
  • 17.
    looking for. Somewould argue that since off the counter tests aren’t unique to a company, then such a numerical test will not truly assess competency for a specific job role. 1.2.5 How do people learn through computer-based assessment? What works and what does not? Another topic we explored was how people learn through computer- based assessment. There are many methods that aid learning on a com- puter. The most popular and commonly used forms of these are multiple choice or true/false questions, labelling images, rank ordering and gap fill- ing. Computer-based assessment can be very popular with both students and teachers. They increase student confidence and are liked by students due to the fact that they get rapid, if not immediate results. They can even be com- pleted in a student’s own time when they are ready to do so. A teacher is also likely to use these methods as a way of administering frequent formative or summative assessments, since less time is spent marking. Then not only can they spend more time adapting their teaching methods (depending on the results of these assessments), but they can do so reasonably soon after the test is taken [39]. Feedback is crucial to the learning process and, as mentioned, one of the advantages of immediate feedback is that the student receives their result straight away, rather than after they’ve moved on from a particular topic. A study conducted at the University of Plymouth [36] compared two groups of students; one using several online materials with two levels of feed- back and another using none of them, to see how they performed in an end of module summative assessment. The group using the available study ma- terials performed significantly better than the other group. Although computer-based assessments can greatly benefit a stu- dent’s learning, there are concerns that online tasks, especially multiple- choice questions don’t encourage deep thinking about a topic, and so don’t aid learning [34]. In order to be as beneficial as possible, these assessments need to both engage and motivate students. 14
  • 18.
    1.2.6 Will differenttypes of learners (kinaesthetic, visual etc.) have different levels of numeracy? Our final area of research was different learner types, and whether some of them would be better at numeracy than others. According to ESL kid stuff, there are many different types of learners, such as Tactile, Global and Analytic. However most people fall into at least one of the following three categories: Kinaesthetic, Visual and Auditory [52]. Katie Lepi [35] describes these types of learners in her article, “The 7 Styles of Learning: Which Works For You?”. She describes kinaesthetic (or physical) learners as people who prefer using their bodies, hands and sense of touch. Writing and drawing diagrams are physical activities, so this sort of activity really helps them learn. Role-play is another commonly used activity for these types of learners. They often have a ’hands-on’ approach, so learn best from partaking in physical activities. On the other hand, visual learners do better by looking at graphs, watching someone do a demonstration or simply by reading. Finally, auditory learners are the kind of people who would rather listen to something being explained to them than read about it themselves. A common way for them to study is to recite information aloud, or to listen to recordings. They also usually like to listen to music while they study [57]. There are many different types of learner styles, and even though most people use a combination of all three techniques, they usually have an idea of how they learn best. If you know what type of learner you are from a young age, then it puts you at an advantage. However, it is also important to adapt your learning techniques whilst you are young so that you are able to use each learning technique effectively [48]. Our aim is to see if there is a correlation between numerical ability (based on our test results) and type of learner. We understand that online computer-based assessments mainly cater for visual types of learner, and so we do not aim to change the online test in order to reflect this, but instead hope to test this theory as part of our analysis. 15
  • 19.
    2 Methodology 2.1 GroupOrganisation In this section, we discuss how we took full advantage of the time given to complete this project, by organising the group members efficiently. 2.1.1 Meeting Times In order to make the most of our meetings, it was important to choose a suitable time for everyone. We decided it would best to meet 2-3 times a week, including a weekly meeting with our project advisor. We ini- tially discovered that there were not many slots in the week that we could all do, due to timetable clashes. To make things clearer we used the widely acknowledged online scheduling tool Doodle (see Figure 1), to pick a conve- nient time for all group members. The Doodle worked well as it was very efficient and quick to carry out, and prevented the confusion we found with suggesting times among ourselves. In the first few weeks of the project, we met a considerable amount, however as term progressed we had set times in which to meet every week; 15:30-16:30 on a Monday and 10:00-12:00 on a Wednesday. To make sure we had a private space for every meeting, we assigned one person to be responsible for booking rooms. During these meet- ings we would discuss development of the project, by updating each other on the progress of our individual responsibilities, and we would delegate future tasks. 16
  • 20.
    Figure 1: Anexample of us using Doodle to decide on suitable times for our group meetings. 2.1.2 Communication One in seven people now use Facebook to connect to their family and friends [32]. It is the most popular form of social media. As a result, we decided the best form of communication between group members, would be through Facebook. We created a closed group (see Figure 2) so that we could share files containing any work we had completed. We also exchanged numbers and created a ’Group Chat’ on Whatsapp, an instant messaging application. The team looked into using Google Documents to keep and edit our work. We found we were limited by this as the site required a Google ac- count which not all group members had. It also was more difficult to facilitate comments and project related discussions. In contrast, our Facebook group allowed all these things and it was quickly decided that this site would be our main form of communication, as no other platform worked more efficiently. 17
  • 21.
    Figure 2: Evidencethat we created a closed Facebook group with all mem- bers. 2.1.3 Combatting Risk The decision to use Facebook as our main method of communication was ideal for our project. It minimised the possibility of losing files and data, which would have had a huge impact on our project. The use of a closed group meant every member of the group could access and upload documents quickly and efficiently throughout the project. So that the rest of the group could edit key information or findings, if necessary. We also decided to split into subgroups which combatted the risk of absence. If one member of a subgroup was not able to complete a certain piece of research, for example due to illness. The other members of the subgroup would be able to finish it, since they would also have a good understanding of the task, having been studying the same topic. 18
  • 22.
    Initially we wentabout identifying all the tasks and activities we wanted to complete throughout our group project. We were then able to create a critical path (see Figure 3) to see if we would be able to finish all these tasks within the time available. The critical path also allowed us to recognise what needed to be prioritised and what could be completed in parallel to one another. Figure 3: Our Critical Path Analysis. 2.1.4 Subgroups Once we had highlighted the key parts of our projects we decided that we would split into subgroups to spread the workload. This enabled us to undertake multiple tasks at once so that we can collaborate to meet our timeframe. The four groups were: writing the questions, programming, statistics and writing up of the report. When deciding whom to put in which subgroup we asked each individual what their strengths and weaknesses were, in order to best utilise our skills for instance, some members of the group preferred programming to statistics. 19
  • 23.
    Deciding who wouldbe in each subgroup was not difficult. Some members of the team were interested in the creative nature of writing the questions. While others had enjoyed computer programming modules taken in previous years. We decided to put more people into the programming subgroup, having highlighted early on that this was probably going to be the most time consuming part of the project, and that there was not a lot of previous programming knowledge within in the group. Some members have already statistically analysed models in the past, so they formed a statistics subgroup. Then finally, another subgroup has put themselves forward for editing and compiling the final report, as they have experienced working with LaTeX and enjoyed editing the written information. Even though the final version of the report will be passed through this subgroup, everyone has taken a very active role in the write up of the report. 2.2 Data Collection 2.2.1 Preliminary data collection The next stage for our team was to gather preliminary data to aid our project - in particular with the development of our own online test. We started by doing some initial research around our topic, in order to find areas that we could look into further. After discussing our initial findings, we came up with four main topics that we would research further, as stated in our introduction. As a result, we had to forgo many other interesting areas, but we decided that these were the four most relevant areas on which to focus our objectives. We also felt that including any more areas of study would cause us to not have enough time to complete the project, nor would we be able to write about them in sufficient depth. We split up our team into four two-person groups and assigned a different area of research to each one, so as to manage our time and resources more efficiently. The only down side of this was that not everyone in the group was fully informed on every topic. However, this was easily overcome by compiling our research into one docu- ment, and making it available on every social platform that we were using. We went about our research in a variety of different ways. Firstly, using available literature such as reading papers, articles, books and web- sites to find evidence for or against our initial thoughts on each topic. This involved much information dissemination and analytical skills on the part of the researchers who had to read through huge amounts of information 20
  • 24.
    and extract thenecessary details in an articulate way. In addition, we car- ried out primary data collection by emailing and contacting relevant sources, such as employers, online test providers and academics. For some of these we established individual contact, asking them specifically for advice or more information on our project, but for the bulk of employers and career websites, we generated a questionnaire to distribute to them. We decided to do it in bulk after the quick realisation that not many companies were responding to our emails. This could have been due to the fact that they were not in- terested in our group project, or some companies might have been too large to assign a contact or specific department to contact us. Using inputs from the separate research groups, so as to make the questionnaire as relevant and useful as possible, we asked a range of questions. This questionnaire was also in a far easier format for companies to respond to, as it saved them time and effort formulating unassisted responses. Bulk distribution ensured that we got as many responses as we could in the limited time frame we had to complete our research. Once the research stage of the project had been completed and we had all our necessary sources, we began to write it up. Within our subgroups, we compiled our best findings and formalised them for our report. We each wrote up our sections, complete with references, ready to be passed along to the editing team. With this, we also included a full write up of our reference information to go into our bibliography. 2.2.2 Survey Now that the research stage of our project had been completed, it was time to move forward with the creation of our own online resource to test our findings. After discussing it as a group we decided that one of the easiest and quickest ways to gather information was by creating an online survey. We felt that this was far quicker to distribute and analyse results with than other methods, such as focus groups, meaning we would have less of a time constraint. The aims of the survey were firstly, to test some of the conclusions and theories formed from our research and discuss what this showed and secondly, to help us create our computer-based assessment by finding out what students find most useful when they are learning. To do this, we asked several questions about learning techniques, types of learners and effective testing methods. We then passed this information on to the subgroup in charge of writing the questions for our online test. They used the 21
  • 25.
    survey feedback tohelp us create a test in response to what people preffered. We felt this would give us a more tailored test written in the most helpful way to students. The fact that the test was designed with student input in mind meant that we could try to benefit test participants, and hopefully improve on currently available tests. Figure 4: The first page of our survey. We created the survey using Google documents (see Figure 4) and set it up as a form. We looked into other online survey distributors but found Google documents to be the best platform as most required a payment to release the survey if it contained more than 10 questions. Google forms allowed us to make an unlimited amount of questions, was quick and easy to use and exported our data straight into Excel for us to analyse. Using contributions from all research groups we generated a draft survey. The sur- 22
  • 26.
    vey was thenchecked by the group to ensure it was appropriate before it was released. This allowed us to make a few necessary changes to the word- ing and remove overlapping questions in order to shorten the test. We were aware that people might be put off taking our survey if it was too lengthy and therefore time consuming. For this reason, we tried to ensure that most questions involved answering either with multiple choice or with a scale of agreement. We also tried to make sure that the survey took no longer than 15 minutes to complete. We then distributed the survey to the public so that we could anal- yse our results as quickly as possible, given that we were on a tight schedule. To take the survey all that was needed was the web link. We spread this link across as many social media platforms as we could, including Facebook and Whatsapp. We felt that this would be the quickest way to distribute our sur- vey as it would target our main audience, students, in a way that was easily accessible for them. The fact that the form was created online made analysis far easier as we could see responses as they came in, and so by keeping an eye on the data, we were able to start analysing the feedback as soon as we had a sufficient number of responses. After approximately a week, we had gathered a large amount of responses and when numbers began to plateau we decided to start reviewing the data. 2.3 Test design, creation and analysis 2.3.1 Producing the Questions While the programming group focused on the technical aspects of creating a computer based assessment, those tasked with writing questions for the test had to make sure they referred back to the information we had already collected about online assessments. We started off by looking at re- sults from our survey in order to determine what types of questions we ought to be asking. As found in our initial research, multiple choice questions were the preferred method of answering. The survey has shown us that gap filling was the least popular method, however we decided that we would still in- clude questions of this form in our test for two reasons. Firstly, it is the most accurate way of seeing if a student has really understood a question, since they can’t guess the answer, and secondly, we thought it would be interesting to include so that we could see if students tended to do worse in these types of questions, as we had hypothesised. 23
  • 27.
    The next stagewas to decide what topics to base our questions on. We wanted to focus on the numerical reasoning style of questions just like on the currently available employability tests. We did this by researching these numerical reasoning tests and replicating their style of questions. This ensured our test was relevant and had the potential of preparing people for such testing. Some initial points raised focused on the types of questions we would have to ask, what topics we would focus on, and how many levels of difficulty we should have. It was also noted that our questions would have to be both realistic to program and relevant to research, in order for the results to pro- vide us with useful information that the statistics group would then be able to analyse. Each member of the subgroup was then tasked with a different research assignment. One member focused on how to effectively test different learner types, while the other two members focused on looking up example questions at different levels of difficulty. Having done this, it emerged that online tests naturally cater more for visual learners and not for the other two learner types [10] [40] [50]. We took the decision not to focus on this aspect when writing our questions, as we would not be able to created different types of questions for a specific learning style, other than visual. Having established that a variety of levels was essential to fulfil our aim of creating an adaptive test, it remained to decide which difficulties we would pick. Since we knew that all participants would have a minimum of GCSE-level mathematics or an equivalent qualification, but not necessarily any further qualifications, we decided to make this our top level of difficulty. However, after their preliminary discussion with statistician - Dr Ben Young- man, the statistics subgroup informed us that having any more than three levels of difficulty in our test would significantly hinder statistical analysis of data later on in the study. This is due to the fact we would be unable to create an effective model. On the other hand, we were concerned that this would reduce the range of results, as if we had six similar questions at the same level, and it was likely that if a participant could answer one question correctly, then they could complete them all. For this reason, we decided to incorporate both KS2, KS3 and GCSE level mathematics. The final element of the decision making process involved reading through the current curricu- lum for KS2, KS3, as well as GCSE-level Mathematics. The final element of the decision process involved reading through the curriculum for Key Stages 2 and 3, as well as GCSE-level mathematics, in order to single out the recurring, most important topics so that we could 24
  • 28.
    base our testquestions around them[22] [24] [23]. The final decision we made was to write two questions for each of ’percentages’, ’ratios’ and ’algebra’ at Key stages 2 and 3, and then to write six statistics GCSE questions, which would incorporate these topics. In this way, we would have 3 multiple choice, and 3 gap-fill questions at each level of difficulty.? Once we?d made all the relevant choices, it was time to write the questions. We found examples of questions on the topics we were focusing on, by looking at teaching resources websites, such as TES [2] [54]. We then adapted these to suit our own needs ? not only did we want to model questions to resemble currently available online assessments, we also had to generate wrong answers for every question that was to be multiple choice. This was the hardest element of the process, as it involved deliberately mak- ing common mistakes with the aim of generating possible wrong answers. Luckily, this was achievable, and due to diligently writing down our thought processes, we were able to relate how we?d created these wrong answers to the programming team, so that they had an algorithm to use in the randomi- sation of questions later in the process. 2.3.2 Programming the Test In this section, we discuss the writing of our online test. Our test acted as a vehicle to provide relevant data in order to help answer the theories we had posed from our research. This meant it was an integral part of our project outcome and was therefore very important to us. We began meetings regarding the creation of the test very early on as we were aware that it would be a very time consuming part of our project. In these, we discussed how we were going to approach the programming aspect. Firstly, we had to choose the programming languages that we would use. We looked into a few different methods. Our first idea involved using the Exeter Learning Environment so that all Exeter students would be able to easily access the test. We thought this would help with distribution, as this website is used by all students at the university, however, the programming behind the website was far too restricting in terms of what we had planned with regards to coding. It also presented the problem that our results would be restricted to one university. Another language we could have used was a version of Maple that would both code and present our questions. It became apparent that it wouldn’t facilitate certain aspects of our test, such as feedback and randomisation. After exploring these different options with 25
  • 29.
    our project advisor,we decided it was best to use the popular server sided language PHP, HTML to code the questions, and to store data on a MySQL database. We choose PHP as it is a relatively simple language that integrates easily with HTML, which is the main language used in the appearance of web pages and was what our questions would need to be written in. It was also the most flexible language so would not restrict us in the design of our test and would enable us to create dynamic webpages involving randomised variables. This was very important as many of our test aims involved randomisation and forms, something that PHP would facilitate and so it would enable us to move information on and off our database effectively. The only limitation of this was that before we had access to an online server we would find it difficult to practice running our code. This was overcome by using XAMPP, a free software that replicates the process of using a server but can be done offline. This meant that we could run our test as it was developed, in order to check its appearance at every stage of developing the test. 26
  • 30.
    Figure 5: Aboveis an example of PHP code which we used to generate Question 1 in our Numerical Reasoning Assessment. 27
  • 31.
    Figure 6: Aboveis an example of HTML code being echoed in PHP which we used to submit Question 1 in our Numerical Reasoning Assessment. The subgroups had researched the current curriculums, decided on the layout and the contents of the test, so the next step was for the program- ming team to create it. Firstly, we familiarised ourselves with both PHP and HTML and got used to writing functions. We used a variety of resources from the library [13] and the internet [59], as well as using our own previously acquired skills. We aimed to understand how to print text, show images and generate tables using HTML so that we could write a well-presented and pro- fessional looking test. We also had to learn how to interact with our online database, move data on and off it and store our results. Following this, we split up the workload between five people, each person being in charge of cer- tain questions and aspects of the test. The limitations we came across were 28
  • 32.
    the time restraintsfor programming, because of the short 10 week period. Due to our initially low level of programming skills, a significant amount of time was spent on familiarising ourselves with the languages and understand- ing the capabilities of the chosen languages. The starting page of our test provided some preparation informa- tion on materials the participant would require, as well as explaining the procedure of the test. The voluntary nature of the test had been specified to ensure the participants did not feel pressured and could terminate at any point. The second page of the test was dedicated to data collection, gath- ering information on age, gender, subject area, GCSE mathematics grade as well as how long it had been since they had last studied mathematics. We also included their university ID,as one of the variables which was then used as an identifier. This was in case a participant chose to sit the test more than once, so we would be able to determine whether an improvement in the mark occurred. The scores awarded were also linked to this identifier so only rows with a matching identifier would be changed, with a score of one for a correct answer and a zero otherwise. We also chose to ask the participants what type of learner they thought they were by providing relevant descriptions, in order to aid us in determining whether that had an effect on their mark in later data analysis. Another piece of data collected throughout the test was the time it took the participants to complete each question, we did this using timestamps in PHP. This helped to determine whether any cheating took place, as well as determining the questions which were found most difficult. As all of us already had sufficient understanding of the code for the write up of the questions, as they were only tables and simple text so were quick to do, allowing us to concentrate on the more complex parts of the programming as described below. Some of our questions (please refer to the Figures 39 to 81 in the Appendix Section for print screen shots of our Numerical Reasoning Assess- ment) included images, such as pie charts and stick diagrams to cater for different types of learners, as mentioned earlier on in this report. Initially we attempted to code them instead of just inserting the images themselves so that we would be able to adapt them, but soon realised it would be an unrealistic target for the short time and limited skills we had. As a group, we made the decision to include them as static JPEG images instead, deciding that the impact of this would be very small. In certain instances we could avoid this limitation, as we were still able to randomise the questions. For others, we decided it was more important to meet our time constraints and 29
  • 33.
    generate our statisticsthan to worry about randomisation. As we wanted to produce questions with both multiple choice an- swers and manual input answers, two types of code had to be written. The approach to writing the multiple choice questions was more complex and time consuming, as realistic answers had to be developed in order to make mistakes believable and the correct answer was not too obvious. However, recording both types of answers as either right or wrong used the same pro- cedure of defining a correct answer and comparing an answer given to it, therefore assigning a value of zero or one. One of the main aims of our project was to build a test that provided immediate feedback, in order to help students improve as they went along and provide understanding if they made any mistakes. Therefore, following every question there was a separate page with a full step by step solution to show how it should have been approached. Another goal of ours was to randomise all of our questions. This involved randomising any values that were used within the questions, so that although the approach and the formula were the same, the question values and answers would be different every time the page was opened. We chose to do this to prevent people from cheating if sitting the test with other people. Also, it enabled us to see more accurately if people’s performance improved, in case they sat the test more than once. The process of randomisation made creating false multiple choice answers and providing feedback more complex. Multiple choice false answers were created using formulas covering the com- mon mistakes, the values used within that had to be fetched and carried through to the PHP page that submitted scores. The same page also pro- vided the feedback, so values were carried to the formula explanation. Another one of our initial aims was to make the test adaptive, so that the next question depended on whether or not you had got the previous one correct. The purpose of this was to enable people to reach an under- standing of a topic before moving to a more difficult question. The team began looking into various methods that would allow us to create banks of questions with varying difficulties. However, when our statistics team con- sulted with our statistical advisor, he advised us that this would be far too hard to model, as we would have many different categories within our vari- ables. Without the model we would then be unable to analyse our statistics well and gain any consequential evidence from them that we could compare to our research. Also with our insufficient programming skills, this would have taken far too long to complete within the time frame. Being such an 30
  • 34.
    unrealistic target wedecided to exclude it, allowing us to concentrate more on our other objectives. Despite time constraints and possibly insufficient skills, a test capa- ble of gathering the required data was developed within the timescale. The next step was making our test live for participants to sit. We looked into some different ways of doing this, but settled for uploading it using our university’s servers. This meant that anyone with the web link would be able to access and sit our test, giving us the most opportunity for people to sit it. One other option we explored was to pay for an online server but this would have been costlier, which was unnecessary when we had free resources. Another option was to use our university college intranet servers, but this would have limited respondents as our test would then only be accessible for CEMPS (College of Engineering, Mathematics and Physical Sciences) students. 2.3.3 Test distribution To ensure that we achieved statistically significant data analysis, our statistics subgroup required a minimum of 40 responses to the test. We were aware that we had a short amount of time available to distribute our tests and that there were many potential difficulties associated with getting enough participants. As a result, we made a very concerted team effort to distribute the test widely, and as quickly as possible. We did this using a variety of social platforms such as Facebook and Whatsapp in order to raise awareness about the project, and to provide a web link for people to take our test. A leaflet was also created to inform people about our test and the benefits it could provide, which we distributed on campus to encourage a wider spread of participants in terms of demographic such as degree type and age (see Figure 7). 31
  • 35.
    Figure 7: Aleaflet promoting our Numerical Assessment. 2.3.4 Test Analysis The first task for the statistics team was to identify what type of analysis we wanted to carry out on our test data. This needed to be completed at an early point in the project so we could relay this to the programming team. The relevant questions were then programmed into the test. We went about this task by breaking down each of the research sections, reading all the research findings, and then deciding the relevant statistics we needed to look into. 1. Why is mathematics important? The Mathematics vs Numeracy De- 32
  • 36.
    bate. (a) Look atthe correlation between test score and GCSE mathematics performance, degree and time since studying maths to see if any of these affects the score. 2. Why do employers test for numeracy skills? (a) What was the average score? What was the range of scores? (b) What was the standard deviation of scores? This can identify whether numerical reasoning tests are able to differentiate between people. (c) What is the standard deviation in the score achieved by people studying the same degree? (d) Did anybody resit the test? Did they achieve a better score the second time? (e) What was the range, standard deviation and mean time taken to complete the test? 3. Do different learners perform differently on numerical reasoning tests? (a) Look at the correlation between score and type of learner. (b) Break down the questions categorically into charts, tables and text questions. Which type of question got the best score? (c) Look at the correlation between the type of learner and the score. Do some types of learners perform better than others? 4. How do people learn through computer-based assessments? (a) Did people read the feedback? What was the average time taken between the questions, on the feedback page? Plot the frequency of time. (b) Did people perform better on the multiple choice questions or the manual input questions? (c) Did people speed up as they took the test? 33
  • 37.
    ’Practical Regression andAnova using R’, [21] stated that regression analysis is beneficial because firstly, predictions of future observations can be made. Secondly, the relationship and effect of multiple variables can be assessed and finally, a general understanding of structure of all the data analysed can be gathered. Therefore, for all the statistics we required in each research topic, it was necessary to make a regression model for the test scores. The same article also identified that the steps taken in regression analysis are: 1. Identifying the distribution of the data. 2. Identifying the initial regression model. 3. Carrying out an initial assessment of the goodness of fit of the model. This would be through hypothesis tests on the variables and numerous diagnostic plots. 4. Using methods to identify the best model fit. ’Applied Regression Analysis’ [5] proposed using stepwise regres- sion to achieve the ’best’ regression fit. This is because working with more variables than necessary is avoided whilst still improving the regression fit. Stepwise regression starts with a regression model with one variable. It sub- sequently adds and removes various variables until the largest coefficient of determinant is achieved. Hence, the model with the largest significance is identified. After this best regression is found, we will be able to identify which variables have the most significant effect on test scores. This is vital for answering our four research topics. We will also be able to make pre- dictions on future scores, such as what score would a ’visual learning girl, studying law, with a grade B in GSCE mathematics and who hasn’t studied mathematics since GCSE’ achieve? It was realised that we would need to collect as many responses from our test as possible. We posed a question to ourselves, ’How many peo- ple need to take our test in order for the results to be significant?’. Having spoken to Dr. Ben Youngman, a University of Exeter Statistic Lecturer, we agreed we cannot create a ’minimum number’ and that the distribution of the scores will depend on the scores of those who take the test. It was clear that as little as four scores would be insufficient to create strong arguments from our findings so, as a group we made a personal aim to get at least 60 entries. 34
  • 38.
    We decided touse R-statistic to run all of our statistic analysis. R-statistic is a leading tool for statistics and data analysis. It very efficiently performs the type analysis that we required, such as producing correlation matrices and modelling data. R-statistic also easily integrates with other packages such as Microsoft Excel. Hence, making it easy for us to export our MySQL database, containing all the test data into an Microsoft Excel spreadsheet and perform our analysis using R-statistic from that. Output in R-statistic is presented in a very clear way that is easy to interpret. Our final reason for using R-statistic was that everyone in the statistics team had used it before, making us very familiar with the built in functions and pro- gramming language. Additional reading in ’Practical Regression and ANOVA using R’ [21] was also used to refresh and improve our R-statistic knowledge. Figure 8: Above is an example of R-statistic code. 2.4 Report Feedback As a group, we recognised the importance of getting external feed- back on our report. Our project’s main aim was not just to create a test, but also to see how our findings related to literature and to observe their poten- tial impact on future students. Receiving opinions on our results would give us a more comprehensive view of our work and would enable us to perform a 35
  • 39.
    more thorough andindependent evaluation. We decided to contact experts via email as we thought this would be the most efficient form of communi- cation. Our first thought was to seek a statistician - we needed someone to evaluate our model and give feedback on our findings. We met with the same person who had advised us earlier on in our project, Dr Ben Youngman. We hoped that he would be able to advise us on anything we may have missed. We also sent our report to Rowanna Smith, the lead Careers’ Con- sultant for the College of Engineering, Mathematics and Physical Sciences, based in the Career Zone at the University of Exeter. We wanted to find out whether, based on our findings, the university would consider using a similar test as a resource made available to students in preparation for job application tests. We also wanted to find whether our results were significant enough for the University Career Zone to consider a change in the advice they currently offer students with regards to preparing for these kinds of as- sessments. Our final point of call was SHL, a provider of numerical reasoning tests. We wondered if they would consider changing their test writing meth- ods based on our own assessment and its findings; for instance inclusion of feedback. We also questioned whether they would consider taking into ac- count different learner types by adapting their tests to suit a wider range of people and their learning habits. 2.4.1 Skill development - Graduate Skills The project we undertook led us to develop a variety of skills as well as gain new ones. As the project involved a very tight time frame, a large amount of time management and task delegation had to take place to ensure all the different sections of the project came together effectively and on time. To enable this to happen the project was broken down into separate sections, which helped us stay on track. These enhanced skills will prove very useful in later life, as many graduate roles will require efficient management of many different tasks, most likely with tight deadlines. Not only did we have to manage our time well by setting realistic targets, but we also had to adapt to changes and challenges that occurred along the way. Over the course of the project, this enabled group members to become more flexible, something required in all future aspects of life. Working in a team has been an essential part of this project, without 36
  • 40.
    which our outcomewould have been completely unattainable. The ability to work in a team is an invaluable skill for later life and prepares us for situations both in and out of the workplace. The ability to communicate effectively with the other members was crucial in enabling the team to stay on track and be transparent so that we could be aware of any potential problems. As a grad- uate, this is vital in order to be able to be part of a working society. Another skill acquired during this project was the ability to research quantitatively and qualitatively as well as to disseminate information and synthesise oth- ers’ ideas. This process was approached in different ways, including a vast amount of reading and contacting both employers and academic members of staff, resulting in a well-rounded background for the report. Research skills are essential to many roles, either directly for graduates in technical roles, or indirectly as transferable skills by improving general analytical and summarising abilities. Designing the test to collect our data developed the team’s problem solving skills as we had to explore several ways to achieve our programming criteria. It also gave us all a basic understanding of one of the most popular scripting languages on the web, an invaluable skill to many employers. The team has also acquired skills in data collection and statistical analysis in order to understand and present the project’s findings, something that many employers look for and value highly. A large aspect of our project involved presentation, both as small progress reports and as a final summary of our report. Through this, all group members had a chance to present their work to an audience, gaining beneficial speaking and performance practice, something we get very little chance to do due to the nature of our degree. This enables people to gain the vital social skills that employers hold in high regard and make up a large component of job applications. 37
  • 41.
    3 Findings 3.1 Survey Whenit came to collecting survey results, it was reasonably simple to analyse our data. Due to having created the survey in Google Forms, we could monitor responses as they came in. Google Forms also produced some basic statistical representation for us, so immediately we had an overview of the key information. Overall, we gained 79 responses which was much higher than our aim of a minimum of 40 respondents. In terms of demographics, we noticed that we had a higher number of female participants with over 70% being women. Also, almost 70% of our respondents were in their third year of university and so dominated our responses (see Figure 9). This was likely due to the fact that our own group was made up of a group of third year students, who were predominantly female. However, due to the nature of our survey and the questions asked we did not feel that this would cause any issue. Especially considering the fact that third year students are the most likely to have come into contact with employability tests, and should also have a good idea of how they learn best at this stage in their education. Figure 9: Pie Chart of Gender and Year of Study of participants in the Survey. 38
  • 42.
    Figure 10: BarCharts of responses in two survey questions. The first set of questions in the survey gave us information on the different ways in which people like to learn and to be tested. The survey worked in two ways. Firstly, it acted as preliminary data for our research, through gathering more information and current opinions on online tests. We planned to compare it to our test findings later on in the process, in order to make comparisons. Secondly, the survey provided new data for us to compare with what the group had already learned from the research carried out. We found that the majority of people preferred multiple choice questions on online assessments, concurring with our research findings that this is a popular, commonly used method. It is worth noting that since possible answers are always provided, it means that these questions do not require as much original thought on the part of the student. It also means that students already have a percentage probability of selecting the correct 39
  • 43.
    answer, in ourcase on average 20%, something that may influence people’s preference for this style, based on perceived comparative ease. The fact that this style was preferred was passed along to the subgroup tasked with writing the online assessment questions, so that this could be taken into account. It was also seen that people feel they benefit significantly from feedback. This matches the opinion we found when conducting research, based on a Plymouth study [36]. This suggests that not only do people want feedback, but that a student’s results can improve significantly as a result of it. This confirmed our decision to include feedback as a major component of our own online test, to ensure that people would be able to learn from their mistakes in previous questions. In terms of Mathematics vs. Numeracy, there was a mixture of results. There were originally mixed opinions from people when asked if they believed their mathematical skills had deteriorated since they had stopped studying mathematics, with the majority of people taking a neutral stance (see Figure 11). The second largest response was ’slightly agree with the statement’, implying that slightly more people may feel this to be true. This may be slightly skewed, as people still currently studying mathematics are likely to strongly disagree that their abilities have deteriorated, due to the fact they are still using them. This surpassed the purpose of the question - to investigate people who have stopped studying maths, and consequently do not use it as often. This may have been the reason for the large spike in people strongly disagreeing with the statement, which made it harder to analyse how people perceived their maths skills, as many of the results shown were not relevant. Figure 11: Bar Chart of responses on deterioration of mathematical skills. 40
  • 44.
    Figure 12: BarChart of responses on deterioration of mathematical skills, excluding mathematics students. To combat this problem, we decided to exclude mathematicians from our data and to repeat our statistics (see Figure 12). This ensured that all respondents tested had all finished studying mathematics and so we could give a full representation of deterioration of mathematics skills. From our new calculations, we then produced a graph similar to our expectations, showing that most people felt that their skills had somewhat deteriorated since they had last used mathematics. This clearly agreed with our research, which showed a strong difference between people who currently study mathematics and those that had stopped. We could compare this with the same effect that results from being unemployed showed us in our research. It was also similar to the study on nurses [20], who performed worse on a similar test after a gap year. However, our data consisted more of qualitative opinions than quantitative results. This slight difference meant that we could not draw any solid conclusions from comparing the two, but could, however, take note of the strong similarities. One limitation of our data may have come from the differing opinions on when participants classed themselves as having stopped studying maths. Some students who study more scientific or quantitative degrees may regard themselves as still using mathematics in their degree, given that they use it regularly in their university work. While others will claim not to study mathematics any more, since the subject itself is not 41
  • 45.
    contained in theirdegree title. Despite this, we felt the discrepancy did not impact our results too heavily, as such students would still have been likely to be of the same opinion when it came to rating their mathematical ability, and so we could still assess the difference. Another slight limitation in comparing our data with literature was that in some similar studies, those tested had been out of any form of study or work at the time, whereas all the students in our survey are all still in academia. This would definitely have affected the extent to which they felt their mathematics skills had deteriorated over time, possibly making our results less pronounced than they otherwise would have been. In addition, our survey showed us that 67.5% of people (see Figure 13) believed numeracy and mathematics to be different things, which agreed with much of our research regarding the Mathematics Vs. Numeracy debate. This shows that the general consensus is that they are different disciplines and require different skills, even if they technically overlap by definition. It would have been beneficial to know why the students thought this, and if they agreed with our research findings on potentially teaching them as two separate subjects. However, due to the design of our survey we were limited to a few set answers and so it is difficult to say how consequential these results are. We attempted to overcome any potential gaps in a participant’s knowledge by giving official definitions of both words, allowing them to make a well informed decision, which may have helped to mitigate some of this problem. Figure 13: Pie Chart representing the opinion of participants on Mathematics Vs. Numeracy. 42
  • 46.
    Figure 14: PieChart representing how participants feel they learn best. 43
  • 47.
    Since another largesection of our research involved different kinds of learners, we included questions on this in our survey. Our research showed several of the learner types but we only choose to include the main three we had focused on in the survey. The team found that the majority of partici- pants fell into a set category, with less than 4% being unsure (see Figure 14). The smallest proportion was of those who believed themselves to be audi- tory learners; however, this was still over a fifth of respondents. The largest section was of the visual learners with 41.8% of people placing themselves in this category. We mitigated the risk of people not being aware of different types of learning or what category they may fall into by getting people to say which description fitted them best, instead of them picking from a list of unfamiliar definitions. However, there was still scope for people to have misunderstood and therefore picked a category despite not being sure, which may limit the reliability of our data. Having said this, our research showed that most people are a combination of these different learning techniques, so some cross over was always expected. In terms of how the different learner categories work, we believed that visual learners were likely to perform better for our chosen type of online numerical reasoning test, leaving the others at a disadvantage. When asked in our survey whether participants believed these online tests cater for different learners, almost a third of them responded negatively (see Figure 15). This helps to back up our research and hypoth- esis by showing that many people do not feel that their learning abilities are catered for. There is always the possibility of this proportion being over estimated by people who do not perform well in these tests in general or feel they should have performed better regardless of what type of learner they are. Nevertheless, as we still have a strong majority this should not have had a significant effect, and thus our data still shows that a significant amount of people feel that they are not examined effectively in online tests. We were able to test this further in the results from our own numerical reasoning test. 44
  • 48.
    Figure 15: PieChart representing the opinion of participants on whether Computer-based Assessments cater for different types of learners. 3.2 Test Our numerical assessment consisted of 20 questions split into three difficulty levels; KS2, KS3 and GCSE. The average mark achieved was 15.28. From Figure 16, it can be seen that the majority of participants scored highly, with over 50% achieving a score greater than or equal to 15. Figure 17 supports this, showing an interquartile range of 6 from a score of 13 to a score of 19. The interquartile range shows a strong concentration of high scores. There is a negative skewness in the results. The highest score achieved was 20, showing the ability to score full marks, whereas the lowest score achieved was 5. 45
  • 49.
    Figure 16: Histogramof Total Score. Figure 17: Boxplot of Total Score. 46
  • 50.
    Figure 18 supportsthe negative skewness of scores. There is an overall bell shape, showing a normal distribution. The light deviation of the peak to the right shows the negative skew. Figure 18: Density Plot of our model. To further analyse our data, we will break down the statistics into the four research topics previously mentioned. 3.2.1 The Maths Vs. Numeracy Debate. Why is mathematics important? The initial hypothesis was that a participant’s score would deterio- rate as the number of years since studying mathematics increased. Surpris- ingly, Figure 19 shows no correlation between score and years since studying mathematics as the line of best fit is a straight horizontal line about the 47
  • 51.
    mean score. However,the correlation coefficient is −0.21, showing a small negative correlation between the variables. Figure 19: Scatter plot showing the total years since studying mathematics vs the total score. Furthermore, from our research into Numeracy vs Mathematics, it is implied that numerical reasoning assessments do not test the skills which participants learn at GCSE-level maths. Therefore, years since studying mathematics has little effect on the score achieved. Our findings support this argument. However, due to the fact that the average age of participants in our numerical reasoning assessment was 20.24 and the average number of years since studying mathematics was 1.68, this does not reflect the whole population. The correlation between GCSE mathematics grade and score is shown by Figure 20. It shows that a higher grade achieved at GCSE resulted in a higher score in our numerical reasoning test. The mean score achieved by a participant with grade B at GCSE was lower than the mean score for an A or A* candidate. The lowest score achieved by an A* grade participant 48
  • 52.
    is higher thanthe lower quartile of A and B grade participants. The highest score achieved by any B grade participant is lower than the average score of an A* grade participant. From these findings, we can see that a strong mathematical background can result in a significantly higher numerical rea- soning test score. As the number of years since studying mathematics has little correlation with the score achieved, this shows that mathematics GCSE grade and actual mathematical ability affect a participant’s score more. This is again supported by Figure 21, which shows participants studying a math- ematical degree. It is assumed that these students have strong mathematical abilities, and that this is the reason they achieved higher scores. We cat- egorised’ ’mathematical degrees’ as Economics, Business, Medicine, Math- ematics and Science. The lowest mean score was for participants studying Humanity degrees. Interestingly, those studying a non-mathematical science (such as Biology) scored higher on average than those studying a mathemat- ical science. However, Figure 21 shows that these results are actually very close. Therefore, we can interpret from this that all sciences require some mathematical skills. Figure 20: Boxplot of Test Scores and GCSE Mathematics Grade. 49
  • 53.
    Figure 21: Boxplotof Test Scores and Degree. 3.2.2 Why do employers use numerical reasoning testing? As stated above, the average score achieved was 15.28. The stan- dard deviation of score results was 4.12. Standard deviation measures the degree of spread of score results. Initial research into why employers use numerical reasoning assessments showed that these tests filter out applicants and help to differentiate between candidates with very similar applications. As our lower quartile is 13, 75% of participants achieved a grade higher than 13. If an employer had a filter that cut out candidates that achieved a grade lower than 13, 25% of our participants would not have passed the test. This shows that numerical reasoning tests can be a useful tool to quickly remove weaker candidates from an application process. The standard deviation of 4.12 indicates a large spread in scores. This makes it a useful tool to differentiate between candidates, as score results are varied and spread out over a wider range of values. Not all participants will achieve similar scores. If everyone scored 15, they would all have to complete further assessments to gage which was the best applicant. Having varied scores reduces this problem. 50
  • 54.
    Figure 21 showsthat the majority of interquartile ranges of the dif- ferent degree types are large. We see that applicants with similar degrees, where one would expect similar mathematical ability, still have a varied range of score results. This is useful for employers as they can use numerical rea- soning assessments to differentiate between applicants with the same degree title. Initially, we wanted to look into whether people had repeated the test to see if their score improved. This is because our research and survey findings showed that feedback and practice of numerical tests should improve your score. The mean time to take the test was 19.47 minutes. This means on average people took 57.81 seconds on each question. This justifies the reason why employers enforce tight time limits on numerical reasoning assessments (commonly a minute or less per question). This isn’t necessarily a method to filter out participants, but as we can see from our timings, it means appli- cants are put under pressure when completing the numerical reasoning test. Employers are keen to find out if a potential employee can work under pres- sure and in a set time frame. The level of difficulty of the numerical test can also be adjusted by changing the time limit. If our numerical test had a time limit of 15 minutes, less than 50% of participants would have been able to finish the test. From initial research we found that numerical reasoning tests are often used even in applications where numerical skills may not actually be necessary. From our survey we found that 37.2% of people believed it was unfair to be numerically assessed in their career job applications and felt they were at a disadvantage to others because they were not ’good at maths’ and ’had not studied it in a long time’. However, from our findings, we can say that employers can increase the time limit on tests, for example in our test to over 35 minutes, so that every participant is able to complete the test in their own time and not miss questions because their time ran out. This is concluded from the fact that box plot and whiskers in Figure 22, is completely below 35 minutes, and only outlier times are above. 51
  • 55.
    Figure 22: Boxplotof Time Taken to complete the Test. 3.2.3 Do different learners perform better on numerical reasoning tests? Figure 23 shows that visual learners on average achieved a higher score than auditory or kinaesthetic learners. Visual learners taking our test had the highest average and smallest range of scores. Literature research done at the beginning of our project, along with our initial survey findings suggests that numerical reasoning assessments used by employers online are not catered to auditory or kinaesthetic learners, with 64.1% of people who took our survey agreeing. The assessment being online, limits the ability to make a numerical reasoning test practical and active to suit kinaesthetic learners. Audio numerical reasoning tests are available, however are uncom- mon and usually only used for participants in special circumstances (such as visual impairments). 52
  • 56.
    Figure 23: Boxplotof Test Scores and Learner Type. Generally, people performed better in questions involving a visual aspect, such as a chart or graph. The average pass rate on these questions was 81.7%, whereas for text questions it was slightly lower, at 68.6%. This may be because the image or table breaks down the information making it easier for all learners to digest the figures, whereas paragraphs of text and figures cater more towards visual learners. 3.2.4 How do people learn through computer-based assessments? From our results, we can determine that the majority of participants neglected to read the feedback provided. The average time taken on the first four questions, were 6, 5, 9 and 5 seconds respectively. This is not enough time to read, understand and learn from the feedback. Research proposed that reading feedback improved score result, for example Rob Lowry in ’Computer aided assessments - an effective tool’ [36]. Our initial survey, Figure 24, also shows that 89.8% of people thought feedback would be a useful tool in an online test. However, as our numerical reasoning assessment was put forward as a ’test’ rather than a casual learning resource, people’s priority could have been to finish the test rather than learn from it. 53
  • 57.
    Figure 24: Barchart of opinion on feedback from the survey. If every multiple choice question was guessed, a participant would have a 20% chance of getting each one correct, and hence we can statistically approximate that they would receive 20% as their overall score. Therefore, if a participant guessed all their multiple choice questions statistically they would have achieved 2.4/12 on average on these questions. So the pass rate when guessing multiple choice questions is 20% on average. Our results show a pass rate of 81% on multiple choice questions. This is significantly higher than 20%, suggesting that few (if any) candidates guessed all their results. The average time taken and average pass rate for multiple choice questions is 50 seconds and 81% respectively. For fill in the blank questions the average time taken was 53 seconds and the average pass rate was 69.5%. We can evaluate this to show that multiple choice questions are easier and that a candidate has a stronger chance of a scoring higher. Simply put, if their answer is not a multiple choice option, then they know it is wrong. In addition, if their answer is similar to an option available in the multiple choices, a participant can select this option and still have a change of getting it correct. This is not possible to do in a ’fill in the blank’ type question. This is supported by our survey, where 42.2% of people preferred multiple choice questions out of 8 different methods. Figure 25 shows the average time taken to complete each question in our numerical reasoning assessment had no trend, as the line graph has no pattern and looks random. If people learnt from the feedback provided, we would expect time taken for each question to reduce as their understanding of the questions asked increased. It became apparent that the feedback we provided was not used, so we cannot support our initial thought. In addition, the incorporation of three difficulty levels; KS2, KS3 and GCSE, could have 54
  • 58.
    counterbalanced the decreasein time taken, as the questions should have been getting more challenging. Figure 25: Line graph of average time taken. Furthermore, we looked at the average pass rate of the questions in each level of difficulty category. We divided our test into 3 categories; KS2, KS3 and GCSE. Figure 26 highlights that the average pass rate fell as the level of difficulty increased from KS2 to KS3. The average pass rate for KS2 level was 87.3%, whereas the pass rate for KS3 was 72.0%. The average pass rate was consistent from KS3 to GCSE level, both being at 72.0%. Our research supports the idea that employers can use numerical reasoning tests of different difficulty level to cater for allowing varying numbers of applicants through to the next stage of the application process. Participants taking a KS2 level numerical reasoning test would achieve a higher grade than those taking a GCSE or KS3 level numerical reasoning test. 55
  • 59.
    Figure 26: Barchart of question category and average pass rate on the questions in that section. 3.2.5 Regression Modelling The density plot in Figure 18 supports the hypothesis that score re- sults follow a normal distribution (as previously stated this can be concluded from the bell shaped figure). The first multiple linear regression model fitted, involved the following variables: degree, years since studying mathematicss, GCSE mathematics grade and type of learner. For research into our four topic questions, we need to evaluate the effect all these variables have on the overall score of the participant. The full summary of our regression model used can be viewed in the appendix. As variables, degree, GCSE mathe- matics grade and type of learner are categorical they are interpreted in R as factors with levels. The regression formula for this model is: Y = 19.084 − 0.501X1 − 3.266X2−2.254X3−0.291X4−0.006X5+3.004X6−3.817X7−2.486X8− 1.093X9 + 0.170W − 4.178Z1 − 3.060Z2 − 2.088K1 − 0.247K2. Where Y is the test score. By using factors we limit the aux- iliary variables X1, X2, X3, X4, X5, X6, X7, X8, X9, Z1, Z2, K1, K2 to bi- nary (0,1). The X variables relate to degree, the W variable relates to years since studying mathematics, the Z variables relate to GCSE mathematic grade and the K variables relate to the type of learner. The p-value for all the variables are: Degree= 0.061498, Years since studying maths= 0.787063, GCSE mathematics grade= 0.007201 and Type 56
  • 60.
    of learner= 0.224736.(These can be shown in the ANOVA table for the model in the Appendix Section, in Figure 85). At a 10% significance level, the coefficient for Degree and GCSE mathematics grade are significantly dif- ferent from zero (0.061498and0.007201 are smaller than 0.1). However, Years since studying maths and Type of learner are not significantly different from zero (0.787063 and 0.224736 are greater than 0.1). Therefore, at a 10% sig- nificance level, the variables Degree and GCSE mathematics grade do signifi- cantly affect the test score result, whereas Years since studying mathematics and type of learner do not. At a 5% significance level, GCSE mathematics grade is the only variable affecting the test score. The other variables would have coefficients not significantly different from zero. The adjusted coefficient of determination is 0.2806, meaning that around 28.06% of the total score can be determined by all the variables used. We then used graphs created in R to help us assess the goodness of fit. Residuals should be evenly distributed. The ’Density estimate of Residuals’ in Figure 27 shows a bell shaped graph, with the peak being about 0. Therefore, the normal distribution of residuals assumption is met. In the ’Residuals vs Fitted’ graph we can see the residuals of the data points are randomly, yet evenly distributed around the 0 horizontal line. This supports our assumption of a multiple linear regression as it shows a lin- ear relationship between the variables and a constant variance. The residuals of a multiple linear regression model are the difference between the observed data of the dependent variable y, the test score, and the fitted values ˆy, given by the model. This graph has few residuals greater than mod 6. These stand-out residuals suggest that there are a few outliers in our data set. In the ’Normal Quantile-Quantile’ graph, the residuals do stray slightly from the 45-degree line. A perfect y = x line suggests perfect fit. However, this is not drastic and the majority of the standardised residuals, especially those around 0, sit closely to the line. This gives us further sup- port in our model assumption of linearity and normal distribution. From the ’Residuals vs Leverage’ graph in Figure 27, we can see that there are two influential observations, one marked 20 and the other 44. Leverage is a measure of how much each data point influences the regression model. A standardised residual point with a high leverage can influence the response of the regression model. Nevertheless, both of these points have a Cook’s distance, a commonly used estimate of the influence of a data point of less than 0.5, and hence are not influential enough to have an impact. They would be considered too influential if their Cook’s distance was greater than 57
  • 61.
    1. All of thisbrings us to the conclusion that all the assumptions of the regression appear to be upheld and that the model fitted is good and can be relied upon. Figure 27: Top left plot is the density estimate of the residuals. Top right plot is the residuals verses the fitted values. Bottom left is the normal quantile- quantile plot of the standardised residuals. Bottom right plot of the stan- dardised residuals verses their leverage. Using stepwise regression in R determined that a suitable reduced model is one involving only the GCSE score. The new model is Y = 17.3929 − 4.6151X1 − 5.6429X2, where the only variable X is GCSE mathematics grade. We interpret this as fol- lows: a participant with an A* grade at GCSE mathematics will achieve a score of 17.39, a participant with an A grade will achieve a score of 58
  • 62.
    12.7778(17.3929 − 4.6151= 12.7778) and a participant with a B grade will achieve 11.7500 in our numerical assessment. The explanatory variables not selected are not necessarily unrelated to the response variable, however, they simply do not add more information than is already provided by the GSCE mathematics grade variable. It could be said that there was multicollinearity in our initial model, hence why this model with only one variable is viewed to be the ’best’. Mul- ticollinearity is when variables are highly correlated and one variable can be used to predict another. GCSE mathematics grade, degree, and years since studying maths, could all be highly correlated. This is because someone tak- ing a humanitarian degree will not have studied mathematics since secondary school. By eliminating the other variables, we remedy multicollinearity. The p-value for GCSE (as shown in the ANOVA table in the Ap- pendix Section in Figure 86) is 4.708e−05, which is smaller than 0.01. There- fore, at the 1% significant level, the coefficient for GCSE mathematics grade is significantly different from zero and so affects the test score result dramat- ically. The adjusted coefficient of determination is 0.3177. This means that about 31.77% of the test score is determined by the GCSE mathematics grade. As before, the graphs in Figure 28 help us analyse the goodness of fit of the model. The ’Density Estimate of Residuals’ graph appears to be a bell shape. However, the normal distribution assumption does not appear to be followed as well as our initial model. There is more of a negative skewness as the peak is further to the right, not centred about zero. The linear model as- sumption may also not be supported. The ’Residuals vs Fitted’ graph shows residuals not evenly and randomly distributed around the zero line. This disproves linearity and a constant variance. Additionally, the data points in the ’Normal Quantile-Quantile’ plot do not sit closely to the 45-degree line. The line follows the direction of the y = x line, however, the plot does not follow the line as well as our initial model, with all the variables. Finally, in the ’Residuals vs Leverage’ graph, there are again two potential influen- tial observations. However, in contrast to our initial model, one of these observations (point 44) has a Cook’s distance greater than 0.5. Therefore, this data point could be considered to have a small amount of influence on the regression model. As the Cook’s distance is still less than 1, it is not a significant influence. 59
  • 63.
    Based on theseresults, the model may not fit perfectly. The initial model with all four variables appears to be a better fit than the regression model created using stepwise regression. Figure 28: Top left plot is the density estimate of the residuals. Top right plot is the residuals verses the fitted values. Bottom left is the normal quantile- quantile plot of the standardised residuals. Bottom right plot of the stan- dardised residuals verses their leverage. Our findings show that a model with the variables degree, years since studying maths, GCSE mathematics grade and type of learner, all affect the score achieved by the participant. Therefore, all these variables should be noted and taken into consideration when employers use numerical rea- soning assessments in job applications. However, the model with just GCSE mathematics grade has a higher coefficient of determinant, and a significant variable at the 1% level. Therefore, just looking at GCSE mathematics grade 60
  • 64.
    on its own,would be good indicator of test score result. Through our two models we can now predict the score a particular participant will achieve. What will a participant score if they achieved an A in GCSE mathematics, are studying Languages at degree level, have not studied mathematics for 3 years and are an auditory learner? From our ini- tial model they would achieve a score of 13.16. From the stepwise regression model, they would achieve a score of 12.78. The models show a similar score, both being 13 when rounded to integer value. To conclude, even though we found that our test results were slightly negatively skewed, and that the average age of participant was 20.24 - so might not represent the entire population, we can see from our findings, that a strong mathematical background can result in a significantly higher nu- merical test score. As can be seen in Figure 20, those that achieved an A* in Mathematics GCSE, have a higher mean average than those that achieved a B grade. Our findings also support our hypothesis that visual learners would achieve a higher score on average, than auditory or kinaesthetic learners. Visual learners taking our test had the highest average and smallest range of scores. Thus, it is clear to see why employers use numerical tests to filter out weaker candidates in a job application process. There may be limitations in our data because we haven’t considered other variables that we have not considered that would affect numerical rea- soning test score. For example: literacy skills, learning disabilities, A level Mathematics grade (when appropriate) and number of practice numerical reasoning assessments done. 3.3 Feedback Findings Our final meeting with Dr Ben Youngman concluded that our ap- proach to the model was ’logical’ and is appropriate to be distributed to other external sources for feedback. This was helpful in regards to reassuring us that our findings were significant and would be useful for employers to see. The team went to see Rowanna Smith, the lead Careers Consultant for the College of Engineering, Mathematics and Physical Sciences based in the Career Zone at the university, to, as previously mentioned, receive feed- back on our report. Her response was overall positive, commenting that our regression model looks relatively correct. She thought that having a resource similar to our numerical reasoning assessment made available for students would help ’bridge the gap’ between the type of tests they are possibly used 61
  • 65.
    to in theireducational career and the type used by employers’. She also said they she would be likely to provide workshops to aid non visual learners in preparation for reasoning tests, since our findings showed that numerical reasoning tests used by employer’s favour visual learners. When asked if she thought our report could have an impact on the companies who make nu- merical reasoning tests, she responded by saying that it would be incredibly difficult to adapt these tests for all types of learners. This is because it is a cheap way for employers to filter out a large amount of applications and that it would be unlikely that this would change in the future. Human Resources are the key department in companies in the job application process. They are responsible for the selection process of ap- plicants, from processing your initial online application, issuing out online reasoning tests, all the way through to assessment centres and company inter- views. We thought contacting the Human Resource departments in KPMG and at the University of Exeter would give us an idea on whether, from the information presented in our findings, whether it would: 1. Make them recognise different types of learners perform differently in online reasoning tests, putting some people at a disadvantage. 2. To see whether they would consider having different options for different types of learners. 3. To see whether they would provide practice tests with feedback 4. To see whether they feel it is fair to use numerical reasoning tests as a filter out process, when your GCSE mathematics grade significantly affects the test score. However, unfortunately they did not respond. Hence, we cannot conclude on these questions from a Human Resources point of view. This could be investigated further in the future. In addition, we made contact with SHL, a company whom create reasoning assessments. We wanted to see if from our findings, they would consider adapting their test to consider the four questions above. Again regrettably, we received no response. 62
  • 66.
    4 Conclusion As illustratedthroughout our findings, we drew many parallels with our research. In our project, we set out to look into current online assess- ments, with the aim of creating our own test improving on what is already available. As mentioned previously, we decided to focus specifically on nu- merical reasoning tests to give us an achievable goal. Objectives of this project were to perform research on the following topics: how do people learn in online assessments, why employers test for numeracy skills, the difference between numeracy and mathematics and why they are both studied, how mathematics deteriorates over time and finally, looking into different types of learners and the affect on performance in numerical reasoning assessments. As mentioned before, a survey was also carried out to get opinions from stu- dents who undertake these tests. Based on this research an online test was developed to try and resolve some of the issues found, such as lack of feed- back and lack of tests that cater to different types of learners. Once the test was developed and the data was analysed we can conclude that the aim of the project has been met however, due to limitations throughout our project such as short time frame and lack of experience with the used software there is still scope for further research. When looking into whether mathematical ability deteriorates with age, we found that there was a strong link between the years studying math- ematics and overall performance in our online test. We saw that generally the longer it had been since participants had studied mathematics the worse they would perform in our test. We can conclude from our statistical find- ings that a students mathematical ability will deteriorate by a measurable amount each year after studying mathematics. This is matched by the data we received from our survey, where we found that 39% of people believed their mathematics skills had deteriorated and the majority of remaining peo- ple held a neutral stance, thus meaning that more people agreed with the statement than opposed. This concurs with our original hypothesis formed from the research in the preliminary findings section. As a result of this, our hypothesis which predicted those who study mathematics or more science related degrees would achieve a higher score in our numerical reasoning test was confirmed. We saw that the mean score of those studying humanity and language degrees was significantly lower than the mean score of science based degrees (e.g. economics, biology, medicine). We then looked at our results in the context of the ’Mathematics vs. 63
  • 67.
    Numeracy’ debate. Fromour findings it is suggested that although our test was numerically based, there is still a link between mathematics and numer- acy since scientific based subjects performed better. In addition, we found that 67.5% of people believed mathematics and numeracy to be different disciplines in their own right, hence demonstrating there exists a perceived difference. Clearly all of the above concurs with our research that suggested that numeracy was the real world application of mathematics. Despite hav- ing both differing definitions and perceived meanings, they inherently rely on one another and thus agree with our previous hypothesis. From our research we found that visual learners tended to achieve higher scores than other types of learners, such as visual and auditory. Con- clusions drawn from our original research support this fully. This was due to the fact that visual learners perform better in visually aided questions in graph or table form, making it easy for them to disseminate visual in- formation. Therefore, numerical tests could put other types of learners at a disadvantage. Further analysis found that 64.1% of people agreed that numerical reasoning tests only cater for visual learners. Through our findings we successfully managed to collect and statis- tically analyse data for all of our chosen research hypotheses. From this we observed significant trends and results in relation to our original problems, formulated from the preliminary research carried out. We believe all our findings are statistically significant and solidly support our research findings. We also believe, due to the relevance and clear findings, our work could provide several wider implications. This includes the way in which peo- ple could prepare for psychometric tests. Our findings on people preferring to learn through feedback could change the way companies, employers and education centres approach the preparation for numerical reasoning assess- ments. Organisations assisting the preparation of these tests could improve their own practice tests through including this style of feedback, enabling participants to learn from their mistakes. In addition, extra help and atten- tion could be given to non visual types of learners, as visual learners have been found to perform better on numerical reasoning assessments than oth- ers. This could help overcome any potential disadvantages they may face when entering employment. Our findings highlight the necessity of numerical skills in the wider world and in all professions. As major employers confirm it is an important topic, there seems to be a definite scope for a universal definition of numeracy. Findings have also stressed the need for the UK Government to consider the 64
  • 68.
    addition or increasedemphasis on the teaching of numerical skills at school, in order to benefit future employability skills. GCSE mathematics was a factor which affects people’s performance. The strong link between numeracy and mathematics reinforce the need for a strong education in mathematics. In addition, our project has provided scope for continued research within this area. This could be through further and more extensive tests. Our findings are also significant enough for us to believe that there are many different factors that determine one’s ability in a numerical reasoning test. 65
  • 69.
    5 Evaluation 5.1 SWOTAnalysis The following section concludes our project and provides an over- all evaluation. In order to ensure that we form a comprehensive, reliable and valid conclusion and hence evaluation, we shall utilise a well renowned method of evaluation analysis known as SWOT. This acronym stands for Strengths, Weaknesses, Opportunities and Threats [55]. Using this type of evaluation structure we aim to provide a well rounded, detailed but most importantly honest evaluation. In our experience the vast majority of eval- uations only focus on the strengths and weaknesses. The concept behind SWOT analysis allows us to reflect more on what threats exist within this project and provide an important section on what future opportunities our project could lead to. 5.1.1 Strengths 1. Teamwork As a team we have worked extremely well together. There has very rarely been any disagreement with the direction that our project should take. When there was a disagreement each member was able to express and explain their views and opinions openly, without restriction or fear of doing so. Each team member has played a role in helping to inform and guide other team members whenever they have been unsure of what they needed to do. Overall our team has been extremely committed in endeavouring to not only complete this project on time but importantly to go above and beyond, completing it to a standard that we are proud to call our own. As a team we held extremely good communication and enjoyed spending time with each other. 2. External Professional Opinions One thing that all team members were very in favour of was seeking the opinions of professional mathematicians such as our Project Supervisor, Dr Barrie Cooper and an academic from the Universities Statistics Department, Dr Ben Youngman. We believed these external professional opinions to be critically useful in guiding our project to achieve our aims. The team was thankful that Dr Barrie Cooper took the time to be present at weekly meetings with us, so we were able 66
  • 70.
    to keep himconstantly updated in terms of our progress, ideas and findings. He constantly provided us with useful feedback allowing us to develop and produce a higher quality report. 3. Individual Group Member Skill Sets One of the aspects that we feel our group has benefitted from is the combination of various skill sets, which each member brings to the team. From our experience of working together we have been able to see firsthand what skills our team members possess so we could col- laborate effectively, utilising our skills in the most efficient way. Some of our members held very strong programming skills whereas other members possessed strong statistical skills. We have all enjoyed gain- ing experience from each others varying skills and as a result, giving us an opportunity to improve and develop new skills. This is particularly important as all our academic studies have become a lot more spe- cific in the final year possibly causing us to neglect broader employable skills. Each team members’ background, both academic and personal, has helped to form our team to become a diverse and driven one. 4. Team Members’ Passion An important strength of our project is due to the fact that all members were all in some way passionate about the area in which our project has focused on, enabling them to be far more motivated as a result of personal interest. Furthermore, being aware of how essential numerical assessments and one’s ability to solve them are to graduate careers, we were motivated to undertake this project. A few of us have a view to go into the teaching profession whereas others are looking to go into the financial sector or in fact further study into computer science. Due to the variety of skills required every member was able to play on their own personal strengths and interests, allowing them to get the most out of the group project. 5.1.2 Weaknesses On completion as a group we reviewed our whole project and al- though we were very happy with the outcomes there is clear room for im- provement. 1. Timing 67
  • 71.
    Although overall wefound that the team had worked effi- ciently to get each task completed by the relevant deadline and strictly following our critical path, the timing was still a weakness in this project. This was particularly related to the programming side of our investigation. Since we first set out to undertake this investigation, we had highlighted the programming to be the most crucial and poten- tially critical aspect of our project. Due to various problems occurring in our program, which are explained further on, and having to seek the guidance of our project supervisor Dr Barrie Cooper in order to debug and resolve errors, time was lost at the end of the project. This put pressure on having to write up our conclusions and findings in a much tighter timeframe. We are aware that having the success of our project rest on one volatile task was a weakness. However, since the programming was paramount to the data collection, which our project was based around, we are content in arguing that this was unavoidable. 2. Programming Restrictions Unfortunately, due to our level of programming skills and the time frame provided some coding of the questions was problematic. This meant we had to put certain restrictions on some of the questions, such as those including static images not being randomised. However, we felt that this did little to limit our project as only a small proportion of our questions were affected. 3. Programme Efficiency Our online assessment played a critical role in the collection of data ready for it to be analysed. In order to create this online assess- ment we first had to write, develop and importantly debug the program code until it was at a working level, which we felt would efficiently fulfil the purpose it was created for. Although our code did just this, on re- flection our team felt that given a larger period of time we would have benefited from further learning of more advanced programming skills. Thus would be able to improve the quality and as a result the poten- tial of our code to bea vehicle to explore our research hypotheses. As students and not professional programmers, although we did achieve working code much to the team’s satisfaction, it would be unlikely to be the most efficient code. For example, our code comprised of a series of 48 independent scripts that used POST variables in PHP to com- 68
  • 72.
    municate, link andrun through each script in turn. More advanced and professional levelled scripts could never run on a series of indepen- dent scripts, each dependent on the one before and after it. There were advantages to this, since for a small programme like our’s, it allowed programming to become more manageable and achievable in our time frame but the disadvantages lay in our programmes limited potential. A numerical assessment in a professional company could comprise of thousands of scripts. If they wanted to edit just one of these then the entire programme would fail until they fixed the script in question. If we wanted to improve our program we could have employed new and more advanced programming techniques by engaging with the further material, but this would involve more time and a complete alteration of our code. Therefore we felt that this is a weakness that we would have aimed to develop on to better our project findings, but also re- alised that it would have been an unrealistic target. In addition, we felt that having a way to stop people from navigating back through their web browser would have benefitted our programme as it would have prevented candidates altering their previous submitted answers. We were aware that this could have been prevented by using some code from JavaScript. However, we appreciate that given our constraints we were not able to alter this weakness and any changes applied once the test had been launched would affect the significance of our collected data. Instead we asked users politely to only use our programme to navigate through the test, where we did not provide an option to return to previous scripts. 4. Statistical Restraints Since a fundamental aspect of our project was both the collec- tion and analysis of data, we were required to be rigorous statistically in terms of our collection of this data. We spoke to an academic within the Department of Statistics in the University of Exeter to seek advice on things such as the types of questions we need and how many would give us the most useful and, most importantly, significant data. We originally strived to program a levelled system of 25 questions in total. The test would be split up into 5 sections testing differing numerical abilities, each section consisting of 5 questions. This meant that should a candidate get a question wrong we would give them another question of the same level, generated from question banks. However, if they got 69
  • 73.
    this correct thenwe would give them a question from a bank which was slightly more difficult. We would restrict this process to 5 questions for each section. This would have enabled people to gain further practice in areas in which they struggle and ensure people get a thorough un- derstanding before progressing. After suggesting this to the academic he advised strongly against doing this since levels make the statistical analysis harder due to having lots of categories. He explained that the level a candidate would get to under this system would determine their score rather than comparing raw scores. It may affect our ability to measure the role of other factors, like age, discipline and gender. Therefore, we did not end up designing this system of questions as it would have critically restricted both our ability to analyse the data against our factors and our ability to trust that our findings were indeed reliable and significant. Furthermore, we were aware that the people who have taken the test probably would not be representative of our aimed demographic. This was due to our team consisting primarily of mathematics students as most of our connections occurred through our degrees, making it difficult to ensure there was an equal spread of re- spondents across disciplines. Further factors that also had effects were gender, for example the female members in our group contacted more females to take our test, a similar argument is available for male par- ticipants. A final statistical weakness we were aware of was the lack of effort and attention possibly paid by people who took the test. We ex- pected that university students would have a lot of social engagements and prior commitments so were aware that the effort would be unlikely to be of the same standard applied when sitting a real job application. To make it a fair test maybe we should have, after researching the time for optimal mental performance, asked each candidate to sit the test at this time. We could also have arranged set places to sit the test to better replicate the type of focused conditions people use when sitting real tests. 5. Norm Groups Another evaluative point is that we did not use norm groups when evaluating participants scores. We did not take into account that, when applying for a job role, the raw scores are not taken into account by employers but are instead compared to appropriate norm groups. Employers would normally compare a candidate against a norm group 70
  • 74.
    of similar demographics.The score of a candidate of a certain demo- graphic taking a test would be compared against previous candidates of that same demographic and not against everyone who took the test in general. By not having used a norm group, our findings may not show the same results as if we had used a norm group. However, con- sidering our sample size it would not have been possible to do this as there would not be enough data to create these groups. 5.1.3 Opportunities 1. More Adaptive As explained in detail in the ’Programme Efficiency’ of the ’Weaknesses’ section we definitely see opportunities to develop our code further. In particular we would wished to make our programme more adaptive. Adaptive tests work by changing the questions a candidate faces based on their performance. Motivated by this we could strive to develop code that would be able to detect the slightest difference be- tween two candidates ability and thus giving them alternative levelled questions as a result. This would ensure the assessment was tailored for more benefit to the candidate by practising questions that they strug- gle with. There are many variables which a more advanced programme could use to pick up on variations in ability, for example even if two candidates scored the correct answer by recording the time taken to answer the same question the programme could determine which can- didate has the better ability and thus assign relevant questions to that person. It should be noted that we did record the time it took candi- dates to answer questions however this was done manually and purely for purposes of data analysis and was not an automatic function on the part of the programme. This is something we see an opportunity to develop. Further variables including recording whether a participant selects the wrong answer first but then changes it. Of course all of these variables are subject to statistical scrutiny which we would en- deavour to investigate via an appropriate academic of Statistics at the University. 2. Extensions to Literacy We see great potential in our assessment to be extended to literacy as well as numeracy. This addition would be relatively simple 71
  • 75.
    to employ sincewe have already written a basic format that the code for the literacy would follow. Extending our test to literacy would be a extra way in which to offer our candidates with a well rounded as- sessment enabling them to gain comprehensive feedback from the two key skills, numeracy and literacy, which are tested heavily in the appli- cation processes of the working world. Furthermore we feel that there is an opportunity here to undertake more advanced statistical research asking questions such as, ’Do maths students achieve better in the nu- meracy test than literacy in comparison to that of English students?’, or, ’Do subjects that involve both an equal amount of literacy and nu- merical skills such as Business studies achieve relatively equal scores in both tests?’. There are many stereotypes about skills students in vari- ous disciplines hold and so extending our test to literacy as well would help us uncover whether these are true, and possibly find evidence of new and unexpected trends, e.g. that English students would achieve better in numerical tests on a whole than Maths students would. 3. Publication As a team we are very proud of our project and do feel that our statistical analysis and our findings currently contribute in some ways to the world of academic literature in this area. Given more time and opportunities to work further with both our project supervisor, Dr Barrie Cooper and the the University of Exeter’s Statistics Depart- ment we would be interested in pursuing this area of research further. Thus allowing us to gain a more detailed and comprehensive study into numerical tests, further extend our assessment to literacy, deepen our analysis and potentially significance of our findings to the real world. This would then allow us to create a paper ready for publication for an appropriate academic journal. 4. Links with Career Zone to share our resource The Career Zone forms an essential support network primar- ily there to assist students in gaining the career to which they aspire. Career Zone understands just how important a students ability to per- form in numerical tests are in order to achieve the best internships, placements, and jobs. Rowanna Smith is the lead Careers Consultant for the College of Engineering, Mathematics and Physical Sciences and she took great interest in the motivation of our research project. We 72
  • 76.
    therefore see opportunityto take our new resource to the Careers Zone team and work with them to potentially offer it through there website, or internally, as a resource able to be accessed and used by students seeking help with numerical reasoning tests. We see this as a very re- warding way for us to benefit the university and its students. This is not only indirectly through the wider implications of our findings, but also directly to those students who participated in our studies. 5.1.4 Threats 1. Timing Timing was an obstacle which we faced constantly through- out the development of our project. Various team members had prior commitments at times and so were unable to attend certain meetings. This meant that everyone was not always aware of all progress being made, thus causing confusion at times. In particular, on the direction our project was taking and more specifically what tasks were required to be carried out as a result. It should be noted, however, that with due to various social platforms, anyone who was absent had various methods of catching up and thus, staying up to date. 2. University Internet not working Our ability to use, access and set live our numerical assess- ment was heavily reliant on the efficiency of the University Internet. The SQL Database which we used to store the data inputed via can- didates sitting the test was kept on the university’s server along with the final programming code. This meant that if the server shut down due to a problem or for maintenance our programme would first and foremost was unable to be accessed by students wishing to sit it and secondly the data generated from that programme would be unable to be downloaded off the servers ready for data analysis. We relied on the fact that the University backed up their files through profes- sional methods. One occasion which highlights most prominently the detrimental effects which the university’s server failures had on us can be seen when we first attempted to make our programme or website go live. We were under a time pressure to get our website with the assessment live, however, when we met with Dr Barrie Cooper to set it up the university server had shut down due to an error which the 73
  • 77.
    university was lookinginto. This meant we had to call off the meeting which meant setting back the date on which our programme went live thus, meaning we were left with less time to both get people to take the test and then analyse the data. 3. Programme Failure Those experienced with programming were aware that the most time consuming parts of creating a working programme is in de- bugging it. Debugging is a term used in Computer Science to mean the identifying and removal of errors from computer hardware or software [16] [17], in our case it was software. For the majority of the develop- ment of our programme we encountered countless errors. These formed big obstacles which were critical to the success of our project, as failure to resolve them meant that we would have no method of collecting data. An example of a big obstacle we faced in our programming was ensuring the correct variables were POST through the PHP code which ensured we were able to make our assessment randomised. Furthermore, just before our assessment was able to become live we had to work to debug errors in the code which were not printing correct characters such as pound signs or apostrophes and we further had to resolve issues with data being successfully inputted into the database. Dr Barrie Cooper was great with helping to resolve these programme failures. 4. Student Participation A big obstacle which we anticipated would arise and which resultantly did, was getting enough students to participate in our re- search by undertaking our assessment. We found it a great challenge to get enough people to sit our test. The participants still did not always follow the protocol which we provided, such as navigating back via their browser, using a search engine to look for advice on how to solve questions. As a group we see that one way in which we would have been able to encourage more students to do the test more rigorously to the protocol set out would be to offer some kind of incentive. 5.2 Improvements The first improvement we could make on our project is to make an adaptive test where the candidate progresses through levels of ’difficulty’ 74
  • 78.
    depending on theirperformance in the test. Our initial plan was to create a test where, based on the result of the previous question whether it was an- swered correctly or not, the question given next is altered. If the answer was correct, the candidate would move onto the next level of difficulty, however, if they got it wrong they would complete another question at the same level of difficulty. Therefore, two people taking the test could end up having taken very different questions. After speaking to Dr. Ben Youngman, we found out that this would really affect the statistical analysis we had planned to perform. The level of difficulty of the question affects the score more than any other factor. This is why instead, we separated the test into 3 levels, KS2, KS3 and GCSE, and each candidate took the same questions. To im- prove the project, we could have done more research into this problem and tried to find a way of analysing our data with the adaptive test. Our set questions were made random through randomising values within each question. We were unable to randomise the numbers in the pie charts because in the time limit we had, we could not work out how this would be done in our PHP code. An improvement on our numerical rea- soning assessment is to randomise these. In addition, we could have used a larger variety of charts, for example a line graph or a bar chart. A further benefit to our project would have been an expansion of our sample set, meaning having more participants taking our assessment. This would have increased our population size and further represented the whole population. A final improvement would have been to have given feedback in the form that aids the type of learner participating in the test. For example, an auditory learner would have benefited from an audio recording explaining how to answer the question. A kinaesthetic learner would have benefited from interactive and physical feedback rather than the written step by step solution of the previous question. Our written feedback was suited mostly to visual learners. 5.3 Further Research The first way we could further our research is to look into potential ’cheating’. Our numerical reasoning test was not under exam conditions so it was not possible to prevent participants from cheating. However, this is the case with most numerical reasoning tests used by employers so may be unnecessary to mitigate for. The tests also do not allow you to know whether 75
  • 79.
    a participant calculatedthe answer or just made a ’lucky guess’. If you guess a multiple choice question with 5 possibilities, there is a 20% chance a par- ticipant will guess the question correctly. We could further our study of the data by creating a criterion to detect cheating and perform statistical anal- ysis. Could you actually read, understand and answer a question in under 4 seconds? We would need to put forward, and answer the question, ’What is the minimum number of seconds in response time before you flag them as cheating?’ With additional research we could assess whether this could be modelled, for example, through Poisson distribution (Poisson distribution models random events that occur at a constant rate). As we did not have anybody take our test more than once, we were unable to evaluate if practicing numerical reasoning tests can improve scores. Therefore, we could further our research by getting participants to retake our test and determine whether their score improved. Although, we did time how long it took participants to take our numerical reasoning test, there was no time limit and we did not show the participants their overall time. In most job application online tests there is a time limit. Moreover, the time is usually shown in the corner of the web- page. We could further our research to see whether this affects peoples scores or if participants get nervous with a time limit causing them to panic and hence perform worse. We could also perform further research to determine whether the time pressure improves test performance due to increased con- centration. To investigate this we could ask people take the test with a time limit and compare the statistics achieved on this test with our original. This could consist of calculating whether the average score or standard deviation changed. 76
  • 80.
    6 Bibliography References [1] Numeracyin practice:teaching learning and using mathematics. Cur- riculum and Leadership Journal, 7(Issue 28):5, 2009. [2] May 2013. [3] Rossbach H. Weinert S. Ebert S. Kuger S. Lehrl S. von Maurice J. An- ders, Y. Home and preschool learning environments and their relations to the development of early numeracy skills. Early Childhood Reseacrh Quarterly, 2011. [4] P. Bell. Why and how to use psychometric testing. recruiter., August 2015. [5] Smith H. Braper, N. Applied Regression Analysis Third Edition. Wiley- Interscience Publication. [6] Parsons S. Bynner, J. Use It or Lose It. The Impact of Time out of Work on Literacy and Numeracy Skills. Basic Skills Agency, 1998. [7] Parsons S. Bynner, J. Qualifications, basic skills and accelerating social exclusion. Journal of Education and Work, 2001. [8] Products CEB, SHL Talent Measurement. Assessment types. [9] SHL Talent Measurement CEB. Aptitude. identify the best talent faster and at less cost. [10] S. Cook. Kinesthetic learning styles: 24 activities for teaching. [11] S. Coughlan. University applications hit record high. [12] B. Dattner. How to use psychometric testing in hiring. harvard business review, human resource management. [13] Michelle E. Davis. Learning PHP and MySQL. O’Reilly, 2006. [14] New York Univeristy Department of Mathematics. Why study mathe- matics. 77
  • 81.
    [15] The Universityof Arizona Department of Mathematics. Why study mathematics. [16] Oxford Online Dictionary. Definition of computer hardware. [17] Oxford Online Dictionary. Definition of ’software’. [18] Oxford Online Dictionary. Definition of mathematics, November 2015. [19] Oxford Online Dictionary. Definition of numeracy, November 2015. [20] Boyle M. Williams B. Fairhall R. Eastwood, K. Numeracy skills of nursing students. Nurse Education Today, 2010. [21] J. Faraway. Practical regression and anova using r. July 2002. [22] Department for Education. Mathematics gcse subject content and as- sessment objectives. Online PDF File, 2013. [23] Department for Education. Mathematics programmes of study, key stage 3 national curriculum in england. Online PDF File, September 2013. [24] Department for Education. Mathematics programmes of study, key stages 1 and 2 national curriculum in england. Online PDF File, Septem- ber 2013. [25] The Department for Education. The national curriculum in england, key stages 3 and 4 framework document, December 2014. [26] R. Garner. Ey. firm says it will not longer consider degrees or a level results when assessing employees. the independent. [27] The Guardian. Gcse results day 2015: pass rates rise as uk students find out grades – as it happened. [28] The Guardian. Maths teaching revolution needed, November 2013. [29] D. Hallet. How is numeracy different from elementary mathematics? University of Arizona, Harvard University, November 2014. [30] K. Herbert. Bias in personnel selection and occupational assessments: Theory and techniques for identifying and solving bias. International Journal of Psychology and Counselling, 5(3):38–44, 2013. 78
  • 82.
    [31] Specialist inGraduate Careers Inside Careers. Assessment centres, nu- merical reasoning tests. [32] Zuckerberg M. Interview, Facebook Founder. Facebook has a billion users in a single day, says mark zuckerberg. BBC, August 2015. [33] A. Jenkins. Companies use of psychometric testing and the changing demand for skills. A review of the literature. Centre for the Economics of Education, London School of Economics and Political Science, 2001. [34] S. Jordan. Assessment for learning: pushing the boundaries of com- puter based assessment. Assessment in Higher Educaton Conference, Cumbria, Centre for Open Learning of Mathematics, Computing, Sci- ence and Technology (COLMSCT), The Open Univeristy, pages 1–12, July 2008. [35] K. Lepi. The 7 styles of learning, which works for you. November 2012. [36] R. Lowry. Chemistry education research and practice, computer aided self assessment, an effective tool. The Royal Society of Chemistry (RSC), 6:198 – 203, July 2005. [37] Assessment Day Ltd. Numerical reasoning test. [38] Metcalf H. Meadows, P. Does literacy and numeracy training for adults increase employment and employability. evidence from the skills for life programme in england. Industrial Relations Journal, 39(5):354 – 369, September 2008. [39] Watt H. Mogey, N. Implementing learning technology - the use of com- puters in the assessment of student learning. Learning Technology Dis- semination Initiative, pages 50 – 57, 1996. [40] C. Neuhauser. Learning style and effectiveness of online and face-to- face instruction. American Journal of Distance Education, 16(2):99–113, 2002. [41] National Numeracy. Why is numeracy important. [42] J. O’Donoghue. Numeracy and mathematics. Department of Mathe- matics and Statistics, University of Limerick, Ireland, Irish Math. Soc, (Bulletin) 48:47–55, 2002. 79
  • 83.
    [43] Better Policiesfor better lives. OECD. Education report. [44] Test Partnership. Numerical reasoning (n-ara). [45] Prospects. [46] P. Robinson. Literacy, numeracy and economic performance. New Po- litical Economy, 2007. [47] C. Rowlands. How organisations get the best out of psychometric test- ing. personnel today in association with network hr. 2015. [48] Learning Rx. Types of learning styles. 2015. [49] Telegraph Staff. Employers receive 39 applications for every graduate job. [50] Kinesthetic Learning Strategies. What are the best kinesthetic learning strategies. [51] R. Stretton. Talent and capability consultant at rsa insurance. Email Conversation, October 2015. [52] ESL Kids Stuff. [53] BBC Recruitment Team. Recruitment. Email Response. [54] tes. Algebra - levelled sats questions, July 2014. [55] Mind Tools. Swot analysis. [56] A. Tucker. What is important in school mathematics. Technical re- port, Department of Applied Mathematics and Statistics, Stony Brook University, 2015. [57] Indiana University. Academic enrichment. 2015. [58] R. Vosburgh. The evolution of hr. developing hr as an internal consulting organization. Human Resource Planning, 30(3):11, 2007. [59] w3schools.com. Php 5 tutorial. [60] Head of the Mathematics Department St Teilo’s Cw High School Cardiff Wylie, G. Information on potential mathematics gcse reforms. Verbal Conversation, June 2015. 80
  • 84.
    7 Appendix The followingAppendix is a compilation of any extra files that we believe are not relevant to this report but may still be both interesting and useful for the reader. Our Survey comprises of the following series of images. Figure 29: Survey Pages. 81
  • 85.
    Figure 30: SurveyPages. Figure 31: Survey Pages. 82
  • 86.
    Figure 32: SurveyPages. Figure 33: Survey Pages. 83
  • 87.
    Figure 34: SurveyPages. Figure 35: Survey Pages. 84
  • 88.
  • 89.
    Figure 37: SurveyPages. Figure 38: Survey Pages. Our numerical reasoning assessment comprises of the following images. 86
  • 90.
    Figure 39: Openingpage of our Numerical Reasoning Assessment. 87
  • 91.
    Figure 40: ParticipantInformation Page for our Numerical Reasoning As- sessment. 88
  • 92.
  • 93.
    Figure 42: Question1 Feedback. 90
  • 94.
  • 95.
    Figure 44: Question2 Feedback. 92
  • 96.
  • 97.
    Figure 46: Question3 Feedback. 94
  • 98.
  • 99.
    Figure 48: Question4 Feedback. 96
  • 100.
  • 101.
    Figure 50: Question5 Feedback. 98
  • 102.
  • 103.
    Figure 52: Question6 Feedback. 100
  • 104.
  • 105.
    Figure 54: Question7 Feedback. 102
  • 106.
  • 107.
    Figure 56: Question8 Feedback. 104
  • 108.
  • 109.
    Figure 58: Question9 Feedback. 106
  • 110.
  • 111.
    Figure 60: Question10 Feedback. 108
  • 112.
  • 113.
    Figure 62: Question11 Feedback. 110
  • 114.
  • 115.
    Figure 64: Question12 Feedback. 112
  • 116.
  • 117.
    Figure 66: Question13 Feedback. 114
  • 118.
  • 119.
    Figure 68: Question14 Feedback. 116
  • 120.
  • 121.
    Figure 70: Question15 Feedback. 118
  • 122.
  • 123.
    Figure 72: Question16 Feedback. 120
  • 124.
  • 125.
    Figure 74: Question17 Feedback. 122
  • 126.
  • 127.
    Figure 76: Question18 Feedback. 124
  • 128.
  • 129.
    Figure 78: Question19 Feedback. 126
  • 130.
  • 131.
    Figure 80: Question20 Feedback. 128
  • 132.
    Figure 81: Closingpage of our Numerical Reasoning Assessment. 129
  • 133.
    The following seriesof images comprise of screen shots of R code which was used for our statistical modelling within this report. 130
  • 134.
    Figure 82: ScreenPrint of R code used for statistical analysis. 131
  • 135.
    Figure 83: ScreenPrint of R code used for statistical analysis. 132
  • 136.
    Figure 84: Screenprint of R code used for statistical analysis. 133
  • 137.
    Figure 85: ANOVAfor initial model. 134
  • 138.
    Figure 86: ANOVAfor model created by stepwise regression. 135