Final Report - Moscholios

Queen Mary
UNIVERSITY OF LONDON
School of Electronic Engineering and Computer Science
_______________________________________________
A REAL-TIME GENERATED 3D HUMANOID AGENT TO
IMPROVE SIGN LANGUAGE COMMUNICATION
Project Report
Programme of Study, Computer Science
Nicolaos Moscholios
Supervisor, Dr Hatice Gunes
20th
of April 2015

120690664
2
Abstract
Communication between hearing impaired and hearing individuals who do not
possess the ability to sign is seldom easily accomplished. This project attempts to
improve the access to sign language for people not affected by deafness. By making
the skill of signing easily available to a broader audience, contact between these two
communities would hopefully improve. Through this system, users will be able to
observe signing thanks to a 3D avatar generated in real time. With the use of
computer graphic techniques, animations are created on the spot. As a result, users are
able to change viewing angles to increase the learning experience and the
understanding of signing gestures. In addition to the avatar, the application also
implements a complete tutoring system with a range of lessons and tests to effectively
teach and assess users’ progress.

120690664
3
TABLE OF CONTENTS
1. INTRODUCTION........................................................................................6!
1.1 Aims ............................................................................................................6!
2. REQUIREMENTS ANALYSIS......................................................................7!
2.1 Previous Projects .......................................................................................7!
2.1.1 Sign Language Structure.................................................................7!
2.1.2 Virtual Teaching of Sign Language ..............................................10!
2.2 Existing Similar Systems.........................................................................11!
2.2.1 Duolingo........................................................................................11!
2.2.2 Babbel ...........................................................................................13!
2.2.3 Assimil and Living Language........................................................14!
2.3 Survey .......................................................................................................14!
2.4 Interview...................................................................................................16!
2.5 Summary ..................................................................................................18!
3. REQUIREMENTS SPECIFICATION ............................................................19!
3.1 LWJGL – 3D Animation.........................................................................19!
3.1.1 Functional Requirements ..............................................................19!
3.1.2 Non-Functional Requirements ......................................................19!
3.2 Tutoring Application...............................................................................20!
3.2.1 Functional Requirements ..............................................................20!
3.2.2 Non-Functional Requirements ......................................................21!
3.3 Summary ..................................................................................................21!
4. DESIGN..................................................................................................22!
4.1 Aims ..........................................................................................................22!
4.2 Pedagogical Approach................................................................................22!
4.2.1 Learning and Assessing ................................................................22!
4.2.2 Marking Scheme............................................................................23!
4.3 User Interface Design..............................................................................24!
4.3.1 Avatar............................................................................................24!
4.3.2 Low Fidelity Prototype..................................................................25!
4.3.3 Rendered Prototype.......................................................................26!
4.4 User Interaction.......................................................................................27!
4.5 In-System Feedback ................................................................................30!
4.6 Summary ..................................................................................................31!
5. IMPLEMENTATION .................................................................................32!

120690664
4
5.1 3D Animation Module.............................................................................32!
5.1.1 Design Choices..............................................................................32!
5.1.2 Skeletal Animation ........................................................................33!
5.1.2.1 Skeleton Structure.............................................................................34!
5.1.2.2 World and Local Space.....................................................................35!
5.1.2.3 Mesh Structure..................................................................................36!
5.1.2.4 Joint Weights and Skinning Algorithm...........................................36!
5.1.2.5 Animation Algorithm........................................................................38!
5.1.3 OpenGL Engine in LWJGL...........................................................40!
5.1.3.1 AnimationModule Class ...................................................................40!
5.1.3.2 Display Lists.......................................................................................40!
5.1.4 Summary........................................................................................41!
5.2 Development Environment.....................................................................42!
5.3 High Level Design....................................................................................42!
5.3.1 Model.............................................................................................42!
5.3.2 View...............................................................................................43!
5.3.3 Controller......................................................................................43!
5.4 Low Level Design.....................................................................................43!
5.4.1 Model Classes ...............................................................................44!
5.4.2 View Classes .................................................................................44!
5.4.3 Controller Classes.........................................................................45!
5.4.4 Extra Classes.................................................................................46!
5.5 Difficulties.................................................................................................47!
5.5.1 Inter-compatibility.........................................................................47!
5.5.2 Visual Consistency ........................................................................48!
5.6 Summary ..................................................................................................48!
6. TESTING ................................................................................................49!
6.1 System Testing .........................................................................................49!
6.1.1 Performance..................................................................................49!
6.1.1.1 Loading times ....................................................................................49!
6.1.1.2 Computing Load................................................................................50!
6.1.2 Heuristic Evaluation .....................................................................51!
6.2 User Testing..............................................................................................53!
6.3 Summary ..................................................................................................54!
7. CONCLUSION AND FUTURE WORK.........................................................55!
7.1 Summary and Conclusions .....................................................................55!
7.2 Future Work ............................................................................................55!
7.2.1 Enhancements ...............................................................................55!
7.2.2 Improvements................................................................................56!
7.2.3 Expansion......................................................................................56!

120690664
5
8. BIBLIOGRAPHY .....................................................................................57!
9. BACKGROUND READING .......................................................................59!
10. APPENDIX............................................................................................60!
10.1 Survey Questions ...................................................................................60!
10.2 Interview Questions...............................................................................62!
10.3 Profiler Results ......................................................................................63!
10.4 3D Animation Module Class Diagram.................................................64!
10.5 Signer Class Diagram............................................................................65!

1. Introduction 120690664
6
1. INTRODUCTION
For everyone around the world, communication through speech comes as a certain
result of human nature. Since a young age we are encouraged to speak, and simply
because of the environment around us we are driven to use our voice as a medium of
communication. However some of us are not that fortunate and are born deaf,
therefore needing to rely on sign language to achieve a similar result to hearing
individuals. The issue is that hearing and deaf communities have very little available
means to rapidly convey information to each other. Consequently, the best solution is
for hearing people to learn sign language.
Sign language is known for being very difficult to learn. In fact, experts say it is
comparable to learning Japanese from scratch. For several years, scientists have tried
to teach sign language in more accessible ways through technology and while virtual
tutoring has been around since years in the form of pre-recorded videos lessons, only
now we are seeing a shift towards 2D or 3D animated tutors thanks to the
advancement in research and the abundance of tools available. But given the niche
market of virtual tutoring in sign language compared to spoken languages, it is not
surprising that the majority of projects trying to tackle this issue are usually
incomplete or even non-existent. As a consequence, hardly any hearing individuals
learn sign language through the existing tools.
1.1 Aims
The aim of this project is to make British Sign Language available to the broader
audience by implementing a tutoring system able to trespass the barrier of pre-
synthesized video. Through this system, individuals interested in gaining the skill will
be able to learn by observing and performing, and subsequently improve by practicing
and assessing themselves. A virtual avatar will be used as a medium to demonstrate
signs; with 3D animation, users are allowed to move and observe what is being taught
from different perspectives, increasing their understanding of gestures and learning
experience.
The pedagogical approach to deliver this complex skill is discussed in Section 4.2 of
this report, while the development and creation of a 3D avatar that can support sign
language as a form of interaction is covered in Chapter 5. A requirement analysis was
undertaken to help identify the current systems and technologies that may be used for
this task (Chapter 2). Additionally, Chapter 6 will cover system and user testing to
assess the effectiveness and usability of the envisioned idea.

2. Requirements Analysis 120690664
7
2. REQUIREMENTS ANALYSIS
The requirement capture for this project included:
• Research in previous projects, most being from other universities and
conference proceedings
• Review and evaluation of existing systems, including applications for sign
language support, as well as language tutoring programs
• A survey, carried out online for the general public in order to receive feedback
on existing learning techniques and views on design ideas from stakeholders
• An interview with Prof Bencie Woll, Chair of Sign Language and Deaf
Studies at University College London
This section aims at finding how past projects approached the implementation of a
tutoring system for sign language, figuring out what are the mistakes to be avoided,
and how to make existing systems better and adapt them to suit the teaching of a very
difficult skill to acquire.
2.1 Previous Projects
There are currently many projects and research being carried out on sign language
recognition, but there are not many on sign language generation. Furthermore, there
are a large number of online tutoring systems available for spoken languages, and the
few targeting sign language are not very appealing.
2.1.1 Sign Language Structure
Generation of sign language is one of the main objectives for this project; therefore
research in methods and techniques to achieve this was necessary. In the work of
Delorme et al1
, they describe sign language as being a group of signs divided into
timing units that should be treated separately. Because SL represents a natural
language, signs are made according to syntactic and spatial-temporal rules; Chinese
language works in a similar fashion. This is different in spoken languages and
transcription, like Braille, where the structure can be subdivided into segments like
sentences, words and even smaller units.. In addition, the two language families use
different modalities: visual-gestural vs. auditory-vocal. The Zebedee approach used in
their project, divides signs in timing units, where each unit is either a key posture or
transition.
Each key posture (key frame) is specified with duration in milliseconds, this is
important as signs that are characterized by the same movement can change with
context. For example, the sign for a tall building (in French Sign Language) is
described by the two hands going up with a distance of roughly 20 cm from each
other. By changing the distance, the building can be described as thin or large. Similar
1
Delorme M, Filhol M, Braffort A. Animation Generation Process for Sign Language Synthesis. In 2009 Second
International Conferences on Advances in Computer-Human Interactions ; 2009: IEEE. pp. 386-390.

8
examples can also be found in BSL (British Sign Language). While on the subject, it
is crucial to know that there are different kinds of Sign Language around the world,
like there are many languages in different countries. The BSL family includes BSL,
Australian Sign Language and NZ Sign Language. On the other hand, “American
Sign Language and Irish Sign Language belong to the LSF (langue des signes
française) family, which is unrelated to BSL, and BSL and LSF are not mutually
intelligible”2
. There is also an International Sign language, which has not been
standardised. Since this project is being developed in a British English speaking
country, it was decided to use BSL.
Going back to Delorme, because each sign has its own transitions from starting
posture to end, XML was used to describe such variables. A skeletal system was set
up with each bone containing a set of variables. Figure 1 illustrates how each sign is
translated from XML to the final animation.
In fact, because XML files are a hierarchical structure, it makes sense to use them to
describe a bone system. Skeletons are used to build characters in any type of
animation, with each bone having a child and parent (except for the root).
Hierarchically storing this kind of structure makes it possible to have each sign saved
in one file, thus making it easier to modify. However it can cause redundancy if many
signs have similar patterns.
Other projects have found techniques like this one to generate sign language3
. At the
University of Campinas, Brazil, students have created an avatar to generate sign
language in real time 4
. Figure 2b shows the mesh and bone system used to display the
2
DCLA. University College London. [Online].; 2014 [cited 2014 September 22]. Available at:
http://www.ucl.ac.uk/dcal/faqs/questions/bsl/question6.
3
Yi-Chun Lin, James Jiunn-Yin Leu, Jyh-Win Huang, and Yueh-Min Huang, "Developing the Mobile 3D Agent
Sign Language Learning System," in The 6th IEEE International Conference on Wireless, Mobile, and Ubiquitous
Technologies in Education, 2010, pp. 204-206.
4
Wanessa Machado do Amaral, José Mario De Martino , and Leandro Martin Guertzenstein Angare , "Sign
Language 3D Virtual Agent ", FEEC, University of Campinas, Department of Computer Engineering and
Industrial Automation, Campinas, PhD Thesis.
Figure 1: Translation from XML to BioVision Hierarchical data file. From
Delorme et al.

9
signs. Their argument for using a virtual character instead of a pre-shot video is that
it does not require the use of expensive filming material, there is no need to hire a
signer, thus preserving anonymity, and it requires much less storage space as files are
only text as opposed to video. In addition it allows freedom for the viewer, who can
change points of view of the character while signing.
In their project they follow the STOKOE system, the first standardised sign language
system. STOKOE uses three elements to describe a sign:
1) Tabula: the hand location in the world space
2) Designator: the hand shape
3) Signation: the hand movement
The hand movements are described in a hierarchical notation, with the sign class
being the root, and Hold, LocalMovement and GlobalMovement as children. Hold
and LocalMovement have 3 additional children each. LocalMovement represents the
forearm, hand and wrist movements, whereas Hold defines the left and right side, and
the head. LocalMovement is the trajectory between two Holds in the same signs,
therefore contains variables like Speed, Orientation, Repeat, etc. to create a movement
or a pattern for the body elements to follow. This kind of structure allows the model
to be altered by simply adjusting the variables contained in each bone. As seen from
Figure 2, the character is a realistic representation of a woman, created in Autodesk
Maya and exported as a FBX ASCII file. Whereas the skeleton structure is exported
as an XML file to store the joints and angles to achieve the different hand shapes.
A third project, from the University of Tunis5
, achieved something very similar, but
developed for the mobile platform. Using an avatar and a database storing X3D files,
it allows users to translate text from the browser or other applications on their phone.
Their objective is also to overcome traditional video limitations related to bandwidth
constraints and video merging. In addition, thanks to the complementary database,
communities can build their own dictionary, making it very flexible in terms of
5
Boulares Mehrez and Mohamed Jemni, "Mobile Sign Language Translation System for Deaf Community", in
21st International World Wide Web Conference, Lyon, 2011.
Figure 2: Picture of the mesh (a), skeleton (b), closed mesh (c). From the
University of Campinas.

10
available sign language expressions. Using SML, a modified version of XML, they
store information for each joint, with a name and initial rotation value. When the user
requests a translation, the data is retrieved from the database. Translating the SML
information to a X3D file generates the animation. Finally the animation is converted
to video and shown to the user. The project goes deeper into mathematical details in
terms of how the animation is constructed, and the techniques employed. However
these will be discussed in the Implementation chapter of this report.
One of the few successful applications created to ease communication between deaf
and hearing individuals is MotionSavvy6
. This very unique project created by a team
of only deaf people, allows signs to be performed in front of a custom build tablet
case containing a Leap Motion controller. The movements are then translated into
spoken language that the tablet outputs as sound. This makes communication possible
between the two users, however the product is aimed at fluent signers that need to
interact with a person that is not a signer, and does not actually teach the language.
It is worth noting that the majority of the work done on sign language generation
follows a somewhat analogous pattern. The 3D models are created as one specific
object, and a skeleton modifies the mesh by changing the positions of the points in
space that constitute it. Despite the fact that most of these projects are able to generate
sign language, they are not aimed at teaching it. The learning process is very slow for
non-hearing individuals. In fact, 4 years of learning for deaf children is equivalent to 1
year for hearing children7
. There are multiple projects that tackle the issue of sign
language being so difficult to teach. For students, having a signer come to schools can
be expensive, and not every adult has the time and budget to take evening classes with
personal tutors. Thus, this next section analyses the existing techniques used to
virtually teach sign language.
2.1.2 Virtual Teaching of Sign Language
There are plenty of applications around the Internet to assist the teaching of sign
language. From online tutorials with real teachers, to simple applications showing
illustrations of signs. At the Technical University of Istanbul8
, students have
employed a humanoid robot to assist the teaching of SL to deaf children by using
interactive games. To explain this further, the robot narrates a short story and the kids
have to identify certain keywords (in this case gestures) included during signing. They
then have to show the flashcard corresponding the word, which will be recognised as
correct or incorrect from the robot. There is an element of visual feedback, where the
robot’s eyes change colours depending on the answer given. In addition, the story
does not continue until the correct card is shown. As a result, it is somehow
mandatory for them to learn to continue the activity.
6
MotionSavvy. MotionSavvy. [Online]. http://www.motionsavvy.com/
7
M Marschark and M Harris, "Success and failure in learning to read: The special case of deaf children", Reading
comprehension difficulties: Processes and intervention, pp. 279-300, 1996.
8
Hatice Kose, Rabia Yorganci, and Itauma Itauma I., "Humanoid Robot Assisted Interactive Sign Language
Tutoring Game", in 2011 IEEE International Conference on Robotics and Biomimetics, Phuket, 2011, pp. 2247-
2248.

11
Another system created at the University of Maynooth9
,
uses a different approach. A computer based virtual
learning environment asks the user to perform a sign
after watching an avatar. The sign is then analysed with
a webcam (with the help of additional coloured gloves
to be warn on the hands) to assess the performance of
the student. Once the sign is performed correctly, the
tutor goes to the next sign. Again, there is a feature of
holding back the user from progress until the correct
answer is given. In Tunis, Jemni and Elghoul developed
a web tool, WebSign10
, which creates courses for deaf
pupils, using graphics as an efficient pedagogical
approach. It is very similar to the previous approach,
but aimed at children in this case. Therefore the
vocabulary and general way of teaching will be quite
different, in the interest of adapting to the sense of
comprehension of younger individuals. One key feature
that all these systems have in common is that they provide an immersive user learning
experience whilst giving the possibility to be run on a standard PC, which is what our
system is also trying to achieve to some extent.
2.2 Existing Similar Systems
There is a wide amount of language learning software around the Internet. In this
section we will discuss some of the most successful ones, what features they offer,
and what strength and weaknesses they present.
2.2.1 Duolingo
From PCMag, Duolingo11
is regarded as “by far the best free program for learning a
language” 12
and is an experience similar to having a personal tutor “in your pocket”13
,
as it is offered as both online and phone/tablet versions. However, it is only available
in few languages, such as French, German, Italian and Spanish. It includes very good
features that help you practice a language, but not necessarily master it. In addition to
that it forces you to go through many levels before you can reach your current skillset
of learning tutorials and appropriate questions. During the learning process, you
advance through the lesson by unlocking the next level through high scores. At the
same time, your progress can be compared with your friends’ on an online global
9
Daniel Kelly, John McDonald, and Charles Markham, "A System for Teaching Sign Language using Live
Gesture Feedback", Department of Computer Science, N.U.I. Maynooth, Maynooth, 2008.
10
Mohamed Elghoul and Jemni Oussama, "Using ICT to teach sign language", in Eighth IEEE International
Conference on Advanced Learning Technologies, 2008, pp. 995-996.
11
Duolingo. duolingo.com. [Online]. https://www.duolingo.com/
12
Jill Duffy. (2013, January) pcmag.com. [Online]. See Bibliography for URL.
13
SETH STEVENSON. (2014, February) www.independent.co.uk. [Online]. See Bibliography for URL.
Figure 3: Picture of the NAO H25
robot.

12
rating table (this constituent is known as “gamification”)14
. It includes a personal
trainer that allows you to set daily goals; represented by a green owl (Fig. 4); the
application does not admit customisation of the avatar.
It uses photographic association as an additional learning technique, in order to aid
memory when discovering nouns (Fig. 5). One problem that may come with pictures
is that users could be finding out too much information from them, therefore
becoming distracted from the words they are learning (leading to guessing the correct
answer instead of actually knowing it). This could be changed by perhaps making the
picture a hint, which the user would have to select but lose 1 point or something
equivalent.
14
S.A.P. | THE HAGUE. (2013, June) www.economist.com. [Online].
http://www.economist.com/blogs/johnson/2013/06/language-learning-software
Figure 5: Picture of the Duolingo interface. A standard quiz is shown here;
as can be seen, pictures make finding the answer much easier.
Figure 4: Picture of the Duolingo avatar showing the user’s progress.

13
Another issue is that the quizzes focus too much on mastering structural blocks of
knowledge rather than conversational skills. This means that if you do not already
know the basics of conversation it will be quite difficult to progress, at least in the
early stages. Therefore this makes it a good system for serious committed beginners
and long terms learners, while not being suited for casual learners. Still, it was named
the “most productive mean of procrastination” and reaches up to 16-20 million users a
week, making it the most popular online language tutor13
.
2.2.2 Babbel
Babbel15
is a paid tutor for anyone who “doesn’t mind online only-programs”12
. Being
very similar to Duolingo, it also does not offer instructor-based web classes.
However, it has a wider variety of languages, currently supporting 11. Differently
from Duolingo, Babbel focuses on building basic conversational skills. For some
languages in fact, it is more interesting to speak than to read (in this, case Sign
Language is considered as “spoken”, because only characterised by visual elements).
Occasionally, it sets immersive lessons by explaining grammatical concepts. Different
categories are explored, including listening, speaking, reading/writing and creating
sentences with words. It also uses picture association to help learning, but is not as
obvious as Duolingo. The voice recognition does not work very well and the interface
is buggy at times. However it does allow customisation for the level of knowledge
(this means you can start at a higher level from the beginning) but that is basically it
in terms of options available. On the mobile version you can set a picture for your
profile, but the virtual conversation does not show any sort of avatar (Fig. 6).
15
Babbel. babbel.com. [Online]. http://www.babbel.com/
Figure 6: Picture of the Babbel mobile interface. The interface is kept simple and easy to read.
Conversational methods are the main technique of apprenticing, as well as storytelling, like in this example.

14
2.2.3 Assimil and Living Language
These two are less popular than both previous systems. Assimil16
is a French company
that publishes foreign language courses. Instead of online tutoring, it creates books
and tapes where users learn from listening and reading physical copies. Offering a
wide variety of languages, available in packages such as on the road, advanced etc. it
can be suited to people that are less comfortable with computers. Here each book
costs around €20, making it relatively expensive. Living language17
is a blend of
Assimil and online courses, giving the possibility to read books, listen to tapes and
follow web tutorials with real teachers.
2.3 Survey
A survey has been carried out to increase feedback from the community. To have a
clearer idea of what the project should target, it was a good opportunity to gather
thoughts and comments on the concept. The questionnaire comprised 10 questions,
covering learning techniques as well as personal opinion on design decisions. 85 total
responses were collected and two versions were made available in English and
French. The latter was due to a few individuals from the French speaking community
that came forward explicitly showing interest in the project and therefore the effort
was made to create a French version of the survey. The full survey with questions can
be found in the Appendix of this report, while complete results are in the
Supplementary User Data in the supporting material submission.
Although results were similar between the two communities, there were also some
differences. When participants were asked what made them fluent in other than their
first language, most selected “good teaching at school”, which is predictable since all
of us learn faster and more easily whilst at school (Fig. 7). However, most English
participants also selected staying abroad as being a positive component of their
learning process. On the other hand the French participants thought working on their
own helped more.
EN FR
Good teaching in
school
Very hard work on my
own
Natural talent, picked
up the language easily
Staying abroad
Either way, this suggests that nearly everyone had to put effort on their own to
achieve their results. Which brings us to the next set of findings. When asked about
which techniques they would use to learn a new language, most people replied with
16
Assimil. Le don des langues. [Online]. http://fr.assimil.com/
17
Living Language. The accent is on you. [Online]. http://www.livinglanguage.com/
Figure 7: Learning techniques.

15
using a software/application and traveling abroad. This is further confirmed later
when asked what techniques they would find most useful if they had to learn Sign
Language (Fig. 8).
EN FR
Training at deafness
centres
Reading on your own
Online application
tutoring
Spend time with deaf
individuals and
practise with them
Although the division is very subtle, it is shown that the English speaking community
gives high importance to direct contact with deaf people as well as online learning to
get the best out of a new language. However, despite the result being similar in regard
to direct contact, the French community seems to prefer training with experts to using
a computer application. This could be due to the lack of learning software made
available in French, thus the absence of trust given to programs for this kind of task.
Obviously there is a lack of personalisation in terms of language base, as most
applications that come out today are developed by American or British companies,
going for a "English to X” language learning.
In fact personalisation was also a point to which most people agreed to as seen from
the results. Some also said it was not very important. This was a curious answer since
the question concerning the colour of the avatar had very diverse answers from both
communities. The graphs for the colours chosen are shown in Figure 9 below.
ENG Colours FR
Blue
Red
Dark Skin
Light Skin
Other
As shown above, the most chosen colour was light skin, with runner up being blue.
However it was interesting to see how the English community also left comments (in
17%
3%
7%
14%
59%
Figure 8: Learning techniques for SL
Figure 9: Preferred colours.

16
“Other”) on themes like discrimination and sexuality. Some participants said, “I think
it’s good to have a neutral colour to avoid racism and discrimination” and “Doesn’t
matter in the slightest”. As a result, it was decided to keep the choice open, with an
option to personalise the colour while and once the account is created. Interestingly
the French community did not leave comments of the sort, with the vast majority
preferring light skin colour to all others. Most participants also decided there should
be a choice to make the avatar a girl or boy. However due to time restrictions and
complications that would develop if the model were to be further modified, it was
chosen to keep it as a boy. In-system feedback was also a covered topic and it seems
everyone would prefer to have both voice and visual feedback while using the
application.
Deploying this survey was definitely a very valuable asset during the development of
the system, and results have shown sometimes-unexpected responses. All the
feedback was taken into consideration, and particular attention has been given to the
personalisation of the avatar. The next section will discuss the results of the meeting
with Prof Bencie Woll to increase understanding of the deaf community and the way
people learn sign language.
2.4 Interview
An interview was conducted with Prof Bencie Woll, an expert and researcher in the
field of deaf studies at UCL. The questions asked during the interview can be found in
the Appendix of this report. Most of the questions have been answered, however the
interview was more of an informal discussion on the topic and the project itself.
Originally I was planning to create this project specifically for parents of deaf
children, since according to Prof Woll, the majority of those parents never actually
learn the language. Not because they do not want to, but are usually told so, because
their children should mix with hearing people (getting them used to interact with
others to improve their social behaviour) while adults should meet other parents of
deaf children, to give them an idea of how their kids will become in the future.
However some kind of interaction with the children should exist, and parents will
therefore have to learn at least a bit of sign language.
This is where Prof Woll advised me to change the scope of this project. When
learning something new (a new language, how to write, draw etc.) you need it to be
taught correctly, otherwise you will struggle to adjust once you get older. This is the
same with sign language. It is known that virtual learning is far from learning by
practicing with other humans, by mixing with people (two way communication). Let
us make an assumption that the model of teaching is slightly wrong. In that case an
adult learning said model would realise some of the signs are not entirely correct; that
can be fixed by mixing with the deaf community. However, teaching the same wrong
model to a young person can be very damaging. As a result, it was decided that the
project should be just aimed at adults “in general” and not specifically at parents of
deaf children. As a further point, even though the software is aimed at a broader
public, “the quality of that first stuff must be good, otherwise they [users] can be
discouraged because they learn something that doesn’t work”.

17
Prof Woll then went on discussing the project itself and mentioned a few potential
issues that could occur. Firstly, virtual models do not have enough information.
Unless being generated through motion capture, the avatars will not have the
“naturalness” of a real person. The face also lacks information that normally serves as
a complement to the hand movement. For example, most signed words are actually
spoken (just lip movement, no sound) to accompany the gestures. This is done since
two signs can look the same, but mean different things. For example “Finland” and
“metal” are signed exactly in the same way, and only lip movement can differentiate
them. Also face movement can represent low or high intensity, emotions like anger,
happiness, just as gaze is the flow of the sign: if I am pointing somewhere with my
hand, I should be looking in the same direction. Without face information, the sign
would be monotone, similarly to how Stephen Hawking’s voice is to spoken
language.
I showed Prof Woll the avatar design and she was pleased with it, only mentioning the
hands looked slightly too large but would not be a problem normally. Also by looking
at the survey she said that there are barely any differences between boy and girl
signers, so creating two models would be unnecessary. I mentioned that the created
model did not have any mouth movement implemented, since I knew the timeframe
for creating a working module for the arms and hands would take long on its own.
After mentioning this she showed me a project from Martin Wright’s Game Labs at
London Met18
, where a virtual avatar is used to generate signs. However the model
looks far from human: it is a genie with a jet propeller on the waist and has a big head
wearing a beard. This last characteristic is actually a smart solution for the lack of
mouth movement since it is virtually invisible. But this particular model uses motion
capture so even though it is not precise, the beard does have some sort of movement.
She then briefly discussed the main lack of work on creating complete sentences in
sign language at the moment.
This project aims at creating a program that will generate 3D signs in real time, by
reading from a file. However Prof Woll pointed out that trying to tackle the creation
of a sentence from more than one file would be more interesting. This means reading
different 3D files and interpolating positions in between to create a long sentence,
linking them together. But there are a few issues:
• Reading from multiple files can generate the signs, but the written sentence
has a different structure to the signed one. Thus signs files have to be pre-
selected, making it not much different to simply using one file with the whole
sentence already generated. “Although this can be explored, especially in the
field of TV where subtitles could be converted to sign language for deaf
spectators, it would require AI and Machine Learning and we are far from
reaching that level of realism”.
• Many things happen at the same time. In sign language, one hand could be
doing one thing, and the other something else; but they could be 2 separate
signs. How would a computer know that they should be executed together and
not one after the other? In addition, hands that are not being used should be
put back to rest position or prepared for the next execution; that also cannot be
pre-described.
18
Martin Wright. (2015) GamelabUK. [Online]. http://www.gamelabuk.com/

18
As a final point, since this project aims at “improving sign language communication”
it is not just trying to explore the 3D generation of signs, but also the learning
experience of the language. Therefore it was suggested that the learning software
should have a sort of self-assessment element to it, so that users can evaluate their
performance. As quoted by Prof Woll “In spoken language, I can hear you speak and
at the same time I hear myself speak. Hearing myself allows me to compare what I’m
saying to what you’re saying, easily knowing if it sounds right or wrong. But in sign
language you do not see yourself sign, therefore it’s hard to tell if you’re making
mistakes”. By being able to record themselves, users could then view their
performance side to side with the original. This would allow them to realise the
mistakes they made, or how close they were to the original. As a step further the two
video streams could be compared with Image Processing to analyse the
similarities/differences to technically assess the closeness of the attempt vs. objective
sign.
2.5 Summary
Due to the limited time frame and the vastness of the fields touched by the concept of
virtual sign language teaching (image processing, real time interpolation of signs,
translation from written language, etc.), I came to realise, with the help of Prof Woll,
that I had to make decisions about exactly which problem I want to address in order to
make this a working yet interesting project. From the original parent-aimed concept, I
refocused on creating an application for a broader audience. I believe this decision
made it easier to relate to existing systems, which are aimed at a very general set of
stakeholders, and at the same time prevented any potential damage to a child’s
learning in case of inaccuracies and confusions in the development of the signing
avatar. With regard to the avatar itself, it was not possible to implement the facial
animation within the time available. However, particular attention has been given to
the signing animation of the arms and hands in order to achieve results as close as
possible to the human equivalent.

3. Requirements Specification 120690664
19
3. REQUIREMENTS SPECIFICATION
This section lists the functional and non-functional requirements needed for the
system. Functional requirements have been divided into primary (functionalities that
must be implemented, unless a valid reason for the impossibility of completion takes
place) and secondary, implemented if time is available. This section is also further
divided into specifications for the 3D and Tutoring Application components.
Specifications have been determined from the results of the survey, the discussion
with Prof Bencie Woll, and background research.
3.1 LWJGL – 3D Animation
3.1.1 Functional Requirements
This core unit of the system is where the 3D model will be animated to recreate the
signs. With the help of LWJGL19
, the unit should:
• Import a 3D model read as a Collada20
(XML) file format
• Draw the vertices, lines and polygons in a 3D coordinate system, with
appropriate colours also obtained from the information file
• Build a skeletal structure representing the base structure of the animated avatar
• Import the SRT (Scale, Rotation, Translation) matrices from the same XML
file mentioned above
• Calculate the matrices for each node situated between bones
• Apply the matrices to the rotation of the nodes in order to place the skeleton in
its rest pose
• Apply skinning to the previously imported mesh, which means calculating the
weight of each node to be exercised on the vertices
• Import a secondary file containing the animation sequence to be executed on
the avatar
• Allow a change of the point of view, by rotating around the character
• Play and pause the animation at any time during playback
3.1.2 Non-Functional Requirements
• The animation should be smooth and responsive to playback commands
• The avatar should have an appealing look and come across as sympathetic
• The hand movement must be as close as possible to the human signing style
19
LWJGL. (2014, November) www.lwjgl.org. [Online]. http://www.lwjgl.org/
20
Collada, Collada - Digital Asset Schema Release 1.4.1, March 2008.

20
3.2 Tutoring Application
3.2.1 Functional Requirements
This part covers the requirements for the software system with which users will be
interacting. It will include both teaching and assessment to create a complete tutoring
scheme. As primary requirements, the system must:
• Allow browsing of the interface through mouse interaction
• Have a specific distinction between the learning material and assessment work
• Include a point-rank system in the assessment work to judge the progress of
the user
• Automatically save the user’s progress after each session, and reload to the
point of termination when coming back
• Allow the creation of an account (locally saved) with a password, for users to
share a program on one machine
• Provide customisation of the avatar at the moment of creation of the account
and at a later time
• Display the animation of the avatar from the LWJGL unit in a frame on the
same page the user is on
• Provide text, sound and visual feedback in order to associate signs to the mind
more easily
• Calculate the score of the tests taken and provide feedback on performance.
• Supports the installation of the software on Windows and Mac operating
systems, including all libraries
If all of the above have been implemented, and if there is time left, secondary
requirements state that the system should also:
• Give feedback during testing by filming the user trying to sign when
prompted, and then show a comparison of his or her movement side by side
with the avatar
Originally, it was planned for the feedback on performance process to include the use
of the LeapMotion21
controller, in order to analyse hand movement and calculate an
approximate “similarity” to how the correct sign should be achieved by the user.
However, response from the interview with Prof Woll led to the decision to abandon
the tool. Being in way too early stages of development and therefore potentially
worsen the usability of the system, it was considered preferable to simply not include
it in the development.
21
Leap Motion. leapmotion.com. [Online]. https://www.leapmotion.com/

21
3.2.2 Non-Functional Requirements
These define how the tutoring system should run:
• The interface should be responsive and look simple to the eye, without
omitting any fundamental purpose
• The program should be intuitive, as we wish to avoid distracting the user from
learning sign language and not make learning the system the main activity
3.3 Summary
The overall system should follow the techniques used by other successful products in
the same market area, in order to achieve similar user satisfaction and positive
learning experience. At the same time it should be innovative by trespassing the
barrier of pre-synthesised video as a source of information and go towards a virtual,
flexible and lightweight data approach. The correctness of the animation
implementation is crucial to the way that the general application is perceived. Prof
Woll pointed out: “the quality of that first stuff [referring to the learning material as a
beginner] must be good enough, because nothing is more likely to discourage a person
than doing something and then discovering that actually it doesn’t work”.

120690664
22
4. DESIGN
This section covers the structure and design of the tutoring application, thus
discussing interface design choices, user interaction and overall functionality of the
system. These choices are backed up by references to existing similar products,
interview, survey and potential stakeholders’ feedback.
4.1 Aims
Signer, that is the name that was assigned to the system, is the application containing
the infrastructure for a complete user-friendly program. Through it, users will be
introduced to sign language and assessed in order to steadily progress by building
their knowledge and skills. From the background research and interview, it was
established that the application should attempt to break the barrier of pre-synthesized
video and make sign language more accessible to the hearing community. Given the
potentially high complexity that comes as a result of 3D animation, only the most
fundamental and necessary actions should be made available to the user. However, as
a learning tool, it should also include all of the basic functionalities found in existing
similar products.
4.2 Pedagogical Approach
The purpose of this project is to make BSL accessible to the hearing community
through the use of a virtual learning environment. It was therefore established that the
pedagogical approach should be as close as possible to that of already popular and
successful systems covering language learning. The tool is structured in an iterative
format, where the user follows a lesson starting from the lowest level and
subsequently takes a test where their skills and learning are assessed.
This short section describes how the teaching component was set up in such a way to
effectively teach sign language and justly assess the learner’s knowledge. To
accomplish this correctly, further research in teaching techniques and evaluation of
the correctness of signs in BSL had to be carried out.
4.2.1 Learning and Assessing
Differently from Duolingo11
, and more similarly to Babbel15
the application aims at
steadily building the users’ conversational skills in BSL. However, a structure
analogous to Duolingo is used, where skills are learnt as a first step and assessed later.
Since I did not have any knowledge in BSL before starting this project, it was
necessary for me to use available resources to start learning the skill myself. Out of all
the appropriate guides and websites I decided to use the For Dummies franchise’s
instructional book of British Sign Language22
, as it would serve as a good basis of
22
Melinda Napier, James Fitzgerald, Elise Pacquette, and City Lit, British Sign Language For Dummies, 1st ed.
London, UK: John Wiley & Sons, Ltd, 2008.

120690664
23
starting material to include in the project. To further increase my understanding of
BSL, I have employed Prof Woll’s book, The Linguistics of BSL23
.
Lessons start from the simplest signs learnt by beginners, including “Hello”, “I’m
deaf”, “I’m hearing”, etc. then building up to longer sentences requiring the use of
more nouns, adjectives and verbs. Each level’s difficulty increases as the user
progresses, learning something new each time. Each lesson comprises of a short
virtual conversation between two people, with a total of 6 sentences. Sentences have a
varying number of signs depending on the complexity of the message that has to come
across. The user is advised to practise the signs shown to them in order to gain muscle
memory and link English words to individual signs more easily; this is necessary in
order to perform well in the assessment section. Every lesson has a corresponding
test, labelled with an identical level number. Assessment is performed with the use of
3 types of challenges:
• Translating: the user is given a sentence in English or in BSL and must
translate it back from one language to the other. BSL sentences are given in
blocks of individual signs where for example “Hello, I’m deaf” == HELLO
ME DEAF and vice versa.
• Recognising: the user is presented with a sentence performed by the virtual
avatar and is given a choice of multiple answers in English, of which one has
to be selected.
• Performing: here is where the learner’s signing skills are tested. The user is
given a sentence in English and must perform the matching BSL version. The
original 3D rendition is then displayed side by side with the user’s attempt.
They must then choose an appropriate mark out of 4 for each feature of the
signing; this marking method falls in the category of self-assessment.
4.2.2 Marking Scheme
Each test is composed of 7 challenges; a mixture of the 3 described above and points
are given out of a total of 100. Usually a test will include 3 translations, 2 recognition
and 2 performing challenges. The challenges are assigned different weighting
components to make up for the final mark (Table 1).
• Translating counts as 10 out of 100 where, if all elements are present (all
words or signs) and are in correct order, full marks are awarded. For each
wrong word 3 points are deducted. If words are missing 0 points are awarded.
• Recognising counts as 15 out of 100 where a correct answer gives full marks,
and an incorrect answer gives 0 marks.
• Performing counts as 20 out of 100 and here different elements are considered.
The user can give themselves a mark from 0 to 4 to describe each of these
features (self-assessment):
i. Hand movement – accounts for 80% of the 20 points
ii. Facial expression – accounts for 10%
iii. Mouth movement – accounts for 10%
23
Rachel Sutton-Spence and Bencie Woll, The Linguistics of British Sign Language: An Introduction, 3rd ed.
Cambridge, United Kingdom: Cambridge University Press, 1999.

120690664
24
The final mark is out of 100 and the user passes the test if they achieve at least
60 out of 100.
Percentage for each feature had to be estimated based on the importance of these
elements in BSL during signing as made clear by Prof Woll in the interview and in
further correspondence: “For some sentences, a specific facial expression is an
essential part of the sentence; for other sentences, some non-neutral facial expression
is essential but not specified; for other sentences it’s just not particularly relevant. The
amount of mouthing varies across signers and also varies depending on what sign is
being accompanied”, B. Woll (personal communication, March 26, 2015). Because
there is no current standard in assessing the correctness of a particular signed
sentence, it was decided to keep facial expression and mouthing assessment low when
calculating the final mark. Moreover, since the avatar does not replicate those facial
features, it was in fact suggested to mainly consider what the learner is shown. Eye
gaze is also an element that should have been considered but was not possible to fit in
the scope of this project.
When progressing to the next level, the lesson for that specific level is unlocked and
the corresponding test can be unlocked only if the lesson is completed. If the user
attempts the test but fails to pass, their level is not increased and the test must be taken
again.
Challenge Weight (/100) Occurrences Overall Weight (/100) Total
Translating 10 3 30
Recognising 15 2 30
Performing 20 2 40
100
4.3 User Interface Design
Building upon the research made on previous works and survey, the user interface has
been designed to integrate all necessary features to make Signer an all-round complete
and user-friendly tutoring application. As mentioned in the requirements
specification, the user interface should allow the 3D avatar to blend in smoothly to
make it a homogeneous part of the interface. The interface has been kept as simple as
possible to avoid possible confusion from the user that could result in low usability.
4.3.1 Avatar
In order to express sign language and communicate it to the user, a virtual avatar had
to be designed (Fig. 10). Given the positive feedback from the survey in that regard,
the original avatar design has been preserved (See Section 10.1 Survey Questions for
pictures). When creating the avatar, two main features were kept in mind:
• It has to come across as friendly
Table 1: Marking Scheme

120690664
25
• Its arm and hand shapes should be anatomically correct in order for sign
movements to look as close as possible to the human equivalent
From the interview feedback, it was also a good idea to entirely remove the lower
body part (legs and hips) since those are not necessary to the communication of sign
language. However, the avatar is never supposed to be shown from far away, meaning
that the user will not necessarily see the missing body parts. This was also done in
order to decrease computing times by
reducing the number of 3D points on the
model (See Section 5.1 3D Animation
Module).
The 3D avatar was modelled in Cinema
4D24
and comes in a variety of colours,
most of which can be found in the survey
statistics [Fig. 9]. The user is allowed to
choose from a selection of pre-defined
colours. Figure 10 shows a possible colour
combination for the avatar. Modifiable
elements include the eye, hair and skin
colours. Light blue was chosen as the
background colour for the reason that it
gives a more delicate effect in comparison
to a completely white background.
4.3.2 Low Fidelity Prototype
24
Cinema 4D. (2014, October) www.maxon.net. [Online]. http://www.maxon.net/
Figure 11: Low fidelity prototype drawn on paper. On the top left is the Login page, top right
is the Signup page, Bottom left the home screen and bottom right the Lesson page.
Figure 10: Avatar as shown in-window.

120690664
26
A low fidelity prototype was designed on paper to create an initial flexible
visualisation of what was possibly going to become the final look of the system (Fig.
11). The low fidelity prototype shows 4 pages that had to be designed:
1. A login page
2. A sign up page
3. The main menu of the application (and a way of editing user details)
4. A lesson page, where the avatar would have to be shown
The lo-fi prototype does not include a Test page for user assessment; that is because
the design approach in that regard was still uncertain at the time. The temporary
design was then brought into Photoshop25
and Illustrator26
to create a rendered version
of the low fidelity prototype. This helped immensely to visualise the colour scheme,
fonts and other layout factors to be considered during implementation. In addition,
buttons and icons could be easily exported as images to be used in the hi-fi prototype.
The colour palette of choice is comprised of sky blue, orange, dark grey and white
(Fig. 12a). Blue usually conveys a feeling of peace, and is therefore the most
prevalent. Orange is the second most prevalent colour, given its pleasing appearance
when matched with blue. Other systems have used similar colours in the past
(Duolingo = blue, Living Language = orange) and since both are aesthetically
pleasing to the eye it was a favourable design choice to use them together. Finally
white and grey have been used to represent those interface elements where other
colours apart from blue and orange are necessary. These colours have been
extensively used throughout the interface to keep a consistent look. A logo was also
designed and is visible at all times in the application (Fig. 12b).
4.3.3 Rendered Prototype
After being processed with Photoshop and Illustrator the interface pages were used as
a template when implementing the high fidelity prototype. All the elements created in
the professional software were directly applied to the implemented system. This
approach meant the final version of the application’s interface would look nearly
identical to the pre-rendered result; application screenshots can be seen in Figure 13.
The screenshots represent the final version of the system, meaning some original
designs have been modified after feedback from the project supervisor and potential
stakeholders.
25
Adobe. (2015, January) Photoshop. [Online]. www.photoshop.com
26
Adobe. (2015, January) Illustrator. [Online]. http://www.adobe.com/uk/products/illustrator.html
Figure 12b: Signer logo as seen in
the application.
Figure 12a: Colour palette.

120690664
27
4.4 User Interaction
For this section, please refer to the Task Flow and Screen Flow diagrams for more
details (Fig. 14a & 14b). When first opening the application, the user is presented
with a login page, where they can either log in, sign up if they do not yet have an
account, or also quit the program entirely.
Figure 14a: Low granularity Task Flow diagram illustrating the main actions available to the user.
Figure 13: In-window screenshots of the final interface designs. Login page (Upper-left), Sign up page (Upper-
right), Main menu page (Lower-left) and Lesson page (Lower-right).

120690664
28
Signing up requires the user to create a username and password, as well as choosing
avatar characteristics. These include skin, eye and hair colours, which will be applied
to the 3D generated avatar when demonstrating signs. This option was made available
as a result of the survey question mentioning customisation of the avatar, for which
most people did express interest (Fig. 9). When all the information has been suitably
inserted, a user account is created and the Main Menu is shown (Fig. 13, Lower-left).
Once in the main page, the user is able to access lessons and tests, and also possibly
change their information like avatar colour, password and username if wished. The
main page also displays the current level of the user informing them of which lessons
and tests are available. The orange locks on the yet-to-be completed sections of the
system further emphasize this choice.
When clicking on “Profile” in the Main Page, the user is brought to the “Edit Details”
page, identical to the “Sign Up” page (Fig. 15). Both pages are actually the same, the
only difference is the title and buttons available (in Edit Details the account can also
be deleted) in addition to the fact that in “Sign Up” a new account is created, and in
“Edit Details” the details are simply updated.
Figure 15: Clicking on profile brings up the Edit Details page, nearly identical to the Sign Up page in Fig. 13.
Figure 14b: Screen flow diagram displaying the pages the user has access to once the application is launched.

120690664
29
If a lesson is selected, the page shown in Figure 16a is presented. The avatar is
loaded and the animation is displayed on top of the corresponding text accompanying
the sign. Audio feedback is also available and is synchronised with the individual
signs. This allows the user to further concentrate on the animation and movement,
rather than having to constantly switch between written text and video. The animation
is looped and the sound can be turned off if preferred. The progress bar is placed on
the side and increases with each new page that is opened. In the lo-fi prototype (Fig.
11) the layout for the lesson page is slightly different. These changes have occurred
following discussions with the supervisor and potential stakeholders, pointing out that
having the text on the side of the video would make it more difficult to follow. The
following diagram shows the different camera angles one can select in order to better
understand the gestures (Fig. 16a & 16b).
When the lesson is completed the user unlocks the corresponding test (see Test
column in Fig. 13, Lower-left). Each test is comprised of recognising, translating and
performing components (Fig. 18). In the performing component, the user is first
video-recorded and asked to perform a specific sign after being given a sentence in
English. The camera will record for the same amount of time it would take to perform
the sign and then stop automatically; the user is informed of this time limit thanks to a
small visual countdown in red (Fig 17).
Figure 16a: The lesson page as seen in-program. The avatar is
displayed performing the sign textually described below. Black
text ! Sign, White text ! English.
Figure 16b: Pressing the buttons shown in Picture 16a can change Points of view. This can be done
while the animation is playing or while it is paused.

120690664
30
Once the user is happy with their interpretation (multiple attempts are available) they
can submit the final version. Figure 18c is presented asking the learner to grade him
or herself using the features listed in the Marking Scheme (See Section 4.2.2 Marking
Scheme). The avatar can also be seen with a different colour scheme determined by
the user’s choice. Please note that when the program is being used, the cartoon avatar
would be replaced with a video recording of the real user.
When all challenges for a test have been completed and the overall mark surpasses
60%, the user level is increased. This grants them access to the next lesson in the list,
which they will have to complete before attempting the matching test.
4.5 In-System Feedback
Taking lessons and passing tests is a good way of learning a skill, but people usually
want to have some sort of feedback during their training. From the background
research on existing systems it was noticed that most applications include a progress
component where the user’s advancement can be viewed. Signer offers the same
functionality; one’s progress is in fact saved every time a milestone is passed.
Figure 18: Recognising (a), Translating (b) and Performing (c) challenges in respective order. Different
coloured 3D avatar and cartoon signer courtesy of http://www.british-sign.co.uk.
Figure 17: Performing challenge and visual countdown (red sphere). Cartoon signer
courtesy of http://www.british-sign.co.uk.

120690664
31
Milestones denote attending a lesson (0.5 level up) and passing a test (1 level up).
After logging in, learners can check their progress so far in the progress section (Fig.
19a). The last known activity will always be displayed as the last day on the chart,
thus listed days might not always start from Monday. The page will also display how
many days passed from the last activity i.e. last time a milestone was completed.
In addition, after each test, a full breakdown of the user’s strengths and weaknesses is
shown (Fig. 19b). This brings an extra dimension to the learning experience making
areas for improvement easily identifiable as well as giving the user a precise idea of
how well or badly they did. When users do not do very well, the feedback should be
constructive without being negative or condescending. For example, while the
message says “Congratulations, you passed!” for a good result, a bad result’s
feedback should be more helpful. In that case, user testing was necessary to determine
which/what kind of message users would prefer to see (See Section 6.2 User Testing).
Finally, in case the user should not be sure what to do on a particular page, an info
button is available at all times to be consulted. It will list and explain basic
information on the current page to help the user follow the correct flow of actions.
4.6 Summary
This chapter described the user interface design for the Signer application. It supports
all of the functionalities listed in the requirement specification (Section 3.2.1). The
teaching approach has been kept in mind at all time in order to make the interface
elements easy to comprehend and the user task flow intuitive. The next chapter will
discuss the underlying implementation of the system, including the avatar animation.
Figure 19a (Top): Progress page as displayed in Signer. The Bar
chart was developed using JFreeChart. Information on the last
activity is shown below, and a legend on the side complements
the understanding of the chart.
Figure 19b (Right): Feedback window showing good points and
possible improvements.

5. Implementation 120690664
32
5. IMPLEMENTATION
In this chapter we will be going through the software design features and choices that
helped successfully implement the 3D avatar animation as well as the tutoring
software delivering the avatar to the user as a learning tool. There are two main
sections; the 3D Animation Module consisting of the core OpenGL27
engine together
with the XML parser and algorithms to generate the animation, and the user
accessible content with its underlying structure. Before going into the details of the
system structure, it is necessary to explain how the avatar animation was developed.
5.1 3D Animation Module
Several sources of information and tutorials have been carefully researched and
employed to best understand the theory behind 3D animation. Part of the theory was
already established through the Computer Graphics module studied in semester one.
However that wasn’t enough, since this component focuses principally on skeletal
animation. References to aforementioned tutorials can be found in Chapter 9
Background Reading.
5.1.1 Design Choices
The main objective of this part of the project was to implement a module that would
load a Collada file written in XML, interpret the information and build an animation
from it. Collada was chosen as the best file format for 3D since it is able to contain
nearly all the information needed (mesh, animation, joints etc.) to correctly build the
avatar. Other 3D files like .obj or .fbx were either incomplete or corrupted most of the
time and were consequently not considered. Given my proficiency in Java acquired in
the past years, it was chosen as the language of preference for the main structure,
while LWJGL19
(Light Weight Java Game Library), which is an implementation of
OpenGL exclusively for Java, was the best option to animate the avatar model.
Figure 20 illustrates the summarised process from loading the Collada files to
displaying the animations and how all components are linked together; this figure can
be used as a visual reference while reading the implementation of the 3D module.
Throughout this section of the report, be aware that the terms “bone” and “joint” are
referring to the same thing and will be used interchangeably.
27
OpenGL. (2014, November) www.opengl.org. [Online]. https://www.opengl.org/

33
5.1.2 Skeletal Animation
Skeletal animation is a technique used to draw a character in two main parts: a surface
to represent the actual character, thus called skin or more formally “mesh”, and a
hierarchical set of bones or joints (the skeleton) which are then animated, driving the
overlaying skin. The model used for this project was created in such a way that it
didn’t look too human (avoiding being creepy) but not too unrealistic either,
otherwise it would be hard for the user to relate to it while signing28
.
In the above Cinema 4D viewport we can see the avatar in bind position (Fig. 21).
The underlying skeleton can be seen through the skin. When the skin is attached to the
skeleton and is consequently invisible we are in rest position.
28
Nicoletta Adamo-Villani, Ronnie Wilbur, Petra Eccarius, and Laverne Abe-Harris, "Effects of character
geometric model on perception of sign language animation", in 2009 Second International Conference in
Visualisation, Barcelona, 2009, pp. 72-75.
Figure 21: Picture of the avatar mesh in bind pose. This is the original position of the modelled mesh.
Figure 20: 3D animation module process. The Shader calculates the animations and updates the mesh for each
keyframe.

34
5.1.2.1 Skeleton Structure
As seen above, the model is normally positioned with the arms stretched out; this is
called pose position. The skin is attached to the skeleton through a skinning algorithm
(Algorithm 2), which is explained in detail later. The skeleton is in fact composed of
multiple joints, interconnected, in a tree-like structure. Such structure can be seen in
the following schema (Fig. 22):
Such hierarchical representation of the bones can also be represented through text
(Fig. 23).
The Root joint is what connects all other joints. We can thus say that each joint has at
least one child joint, except for the end nodes (those including the keyword End).
These are the names given to the joints inside the Collada file and are imported into
the program such that there can be a differentiation between each one.
Figure 22: Skeleton tree structure as seen in the Cinema 4D editor viewport.
Figure 23: Skeleton tree structure represented with text.

35
5.1.2.2 World and Local Space
Each joint (called Bone in the program) has 5 main attributes:
• Name: this is the joint’s name, used to distinguish it easily
• fileID: this is the unique ID the joint has in the XML file
• AnimationLocalMatrix: this is the matrix defining the local space
transformation
• AnimationWorldMatrix: the matrix used for world transformations
• Children: this is an array of joints used to store references to this joint’s
children (or sub-nodes in the tree)
The local space transform is directly exported from the Collada file and is given in a
format of SRT (Scale Rotation Translation). This means the matrix will contain all
information about position in space, local rotations and scaling (not used here).
Translation =
1 0 0 !!
0 1 0 !!
0 0 1 !!
0 0 0 1
RotX =
1 0 0 0
0 cos ! − sin ! 0
0 sin ! cos ! 0
0 0 0 1
RotY =
cos ! 0 sin ! 0
0 1 0 0
− sin ! 0 cos ! 0
0 0 0 1
RotZ =
cos ! − sin ! 0 0
sin ! cos ! 0 0
0 0 1 0
0 0 0 1
When all multiplied together by doing RotY ⋅ RotX ⋅ RotZ and replacing the last
column of the resulting matrix with the Translation Matrix, we get the Local
Transformation matrix (Fig. 24). Local transformations are in fact the transformations
of the child joint in relation to its parent. To be able to extract the correct point
positions in space (or World positions), we must use Forward Kinematics to
calculate the World Transformations for each joint. This is done in Algorithm 1:
for each bone in skeleton:
if bone == 0:
bone WorldMatrix = bone LocalMatrix
END if
for each child of bone:
child WorldMatrix = bone WorldMatrix * child LocalMatrix
END for
END for
In the above pseudo-code, for each bone in the skeleton, each of its children’s World
Matrix is computed by multiplying their Local transformation matrix with their
parent’s World matrix. The only exception is the Root, whose World matrix is the
Figure 24: Translation and Rotation matrices used in 3D animation.
Algorithm 1: Algorithm createRotationMatrices()

36
same as the Local since it does not have a parent node. After doing so we can extract
the 4th
column and get the corresponding XYZ positions to draw the skeleton (Fig.
25).
5.1.2.3 Mesh Structure
A Mesh, often called skin, accompanies the skeleton. It is placed on top of the
structure and is what the viewer actually sees. A model stores information about all of
its vertices, normals and faces. A face contains two 3D vectors with vertex and
normal indices. These indices are pointers to the lists of all the vectors and normals
contained in the model. The data is extracted from the Collada file and stored in those
objects. Each face also has a colour attribute. The skin has to be somehow “attached”
to the skeleton thus weights have to be applied in order for the vertices to properly
follow the joints.
5.1.2.4 Joint Weights and Skinning Algorithm
In the model, for each vertex there are none, one or more weights associated with
different joints. Simply put, this means each vertex can be affected by multiple joints.
Such theorem is illustrated below (Fig. 26):
In the diagram above, there are two bones i and j. The surrounding box represents the
skin and each of the black dots is a vertex. It can be noticed how as we get closer to
one joint and further from the other, the weight of the former increases, while the
latter decreases. This produces a smooth bend between joint rotations. The skinning
and distribution of joint weights was done in Cinema 4D (Fig. 27):
Figure 25: Application of local transform to each child joint and resulting world final positions
(courtesy of What-When-How.com).
Figure 26: Joint-weight distribution among vertices. Courtesy of What-When-How.com.

37
The avatar’s skin is coloured in different shades showing the joints that affect
different parts of the skin (different colour = different joint). In the Collada file this
information is given as one list of affecting joints (the ones that actually have effect
on the skin, the root in fact does not), a list of all the weights, a list of all vertices
(position indicating index and value indicating how many joints affect it) and a list of
indices placed in pairs of 2, containing the index of the affecting joint and the index of
the affected vertex. The value for each vertex’s total weight must be equal to 1. That
is, all affecting joints’ weights combined.
We will not go into the details of how the data is parsed, but in the end, we are left
with each joint having a pointer (vertexWeightIndices in the program) to vertices
and their matching weight. In addition, each joint is also given an InverseBindPose
matrix. This matrix contains the necessary transformation to convert the model’s bind
pose position to animated position. The complete skinning algorithm is identified as:
!!
= !!
!
!!!
!!!
(1)
The algorithm (Eq. 1) takes the unchanged vector !, multiplies it with the joint’s
associated skinning matrix !! and its affecting weight !!. This calculation is done on
the same vertex for all affecting joints, which explains the sum ∑!!!
!
. The
aforementioned skinning matrix !! is the result of multiplying the joint’s World
transformation matrix (WorldMatrix in Alg. 1) with its InverseBindPose Matrix.
Here is the same algorithm (Eq.1) used in the program in pseudo-code.
• !!
is the resulting vector
• ! is the non-transformed vector
• ! is the number of associated bones
• !! is the weight associated with the
bone
• !! is the skinning matrix for the
bone
Figure 27: Distribution of joint weights seen in the Cinema 4D viewport. The
“weight painting” tool was used here to smoothly assign vertices to specific joints.

38
for each vertex in model:
inputV = vertex
outputV = new vector(000)
total weight = 0
for each bone:
for each weightVertexIndex:
if vertex == weightVertexIndex:
total weight = total weight + weight
v = SkinningMatrix transform inputV
v = v * weight
outputV = outputV + v
END if
END for
END for
if total weight not equal 1:
normalised weight = 1/total weight
outputV = outputV * normalised weight
END if
END for
To summarise Algorithm 2, the original vertex vertex is selected, and a new blank
vector outputV is created. All the bones are then searched to see which one has an
effect on this particular vertex (if vertex == weightVertexIndex). If such
joint is found, the weight is added up for further checks, while a new vertex v is
transformed with the joints SkinningMatrix. Then the actual weight is multiplied
to it, and finally it is updated. At the end an if statement checks if the weight is not
correctly assigned (we need to normalise the weight for all vertices so that the effect
is smooth), in which case the vertex outputV is multiplied with the normalised
weight. The mesh is now ready to be displayed.
5.1.2.5 Animation Algorithm
The loaded model needs to be animated and the necessary information can also be
found in the Collada file. However this time a different file is imported containing
only the animations. This drastically reduces file size: keeping one file with only
initial position data (i.e. all of the aforesaid data) and multiple different files with the
joint rotations for animations. The main file is about 1.8Mb, while each animation file
is around 50kb. This is a big difference in terms of storage compared to pre-
synthesized video. The animation data is given in XML format in the Collada file
(Fig. 28).
Algorithm 2: Algorithm calculateFinalPointPositions(Model m, Skeleton s)

39
<animation>
<animation>
<float_array>0 1 1.5</float_array>
<float_array>92.3743 162.595 162.595<float_array>
<channel target= “ID67/rotateY.ANGLE”/>
<animation>
.
.
<animation>
<animation>
The outer animation tag refers to an animation for a specific joint. Each outer
animation tag has 3 inner animation tags: one for each arbitrary axis X Y Z. The first
float array contains the keyframes; these are the moments in time for which the joint
will have a specific rotation applied to it. In the example, 0 means the start of the
animation, 1 is 30 frames in and 1.5 is consequently 45 frames in and the end of the
clip. The following float array contains the actual angles to be applied at each of the
frames in the other array. Finally the channel tag specifies which joint and which axis
those changes have to be applied to.
We have therefore created an Animation object for each joint. This way all the
animations for the clip can be stored individually for each joint movement. The
Animation object consists of 2 methods. setAngleDifference() will iterate
through the key frames list and calculate the constant increase between two frames.
To explain the idea further, say we have 2 keyframes with values 3 and 10, and the
start angle is 90° while the end angle is 140°. To calculate how much increase there is
per frame, we simply do |3-10| = 7 (how many frames) and 140-90 = 50. Thus the
increase per frame will be of 50/7 = 7.14°. These values are stored in an Arraylist.
Once all angle differences have been computed, we can create the complete angles.
makeFullAngleList() will iteratively add the previously found difference to the
start angle until it reaches the objective end angle at the next frame. It is done this way
because Euler angles will not rotate properly if the angle difference is directly
multiplied with the previous value. As a result, it is necessary to add the angle to the
original value and constantly increase it. This process is also known as interpolation.
In this particular case, the interpolation is linear, meaning the joints have a constant
angle increase. Smoother animations can be obtained by using Spline interpolation,
where the angle difference decreases/increases when reaching a separating keyframe
(Fig. 29). Linear interpolation has been kept as the preferred method since hand and
finger gestures are so short the difference is barely noticeable.
Figure 29: Linear vs. Spline interpolation and the separating keyframes (red).
Courtesy of Wikipedia.org.
Figure 28: Animation data in XML format from Collada file.

40
Finally, the joints are modified using the Animation Algorithm. What it does is, very
simply, iterates through each animated joint; that is every joint that needs to be
animated (not all bones move during a clip). A loop goes through all the frames and
calls the algorithm passing to it the joint and the current frame as arguments. The
animate method actually looks similar to Algorithm 1, since they do basically the
same thing, except here the new rotation is calculated by extracting the angles
computed in the Animation class. Then each joint’s World matrix is produced and the
new vertex points (the mesh) are stored per keyframe by calling the skinning
algorithm. The mesh is now ready to be displayed.
5.1.3 OpenGL Engine in LWJGL
OpenGL was selected as the best programming interface to create a virtual
environment in which to display all the needed animations for this project. Since Java
was the language of choice, the LWJGL library was used to enable cross-platform
access to the API.
5.1.3.1 AnimationModule Class
The AnimationModule class is the main component, or connector of the animation
implementation. This class drives the rest of the components by calling the model and
skeleton loaders, shader and creating the display where the animations are shown. A
more complete 3D Animation Module class diagram of the animation engine can be
found in Section 10.4 in the Appendix. When called, the constructor for this class
initialises the model by calling the loadModel(file) in the MeshLoader class; this
method takes a file (the avatar Collada export in rest position) and similarly initialises
the skeleton by calling the loadSkeleton(file) method in the SkeletonLoader
class. It then creates the rotation matrices to build the model in its rest position.
The next thing that has to be done is construct the Display, where all the animations
will be shown. Display is a static class belonging to LWJGL and therefore cannot be
instantiated: this resulted in creating a separate thread and looping the animation
process. We will come back to this implementation choice in section 5.6 Difficulties.
The display is created but does not contain anything just yet. All the animations for
the clip are loaded with the shader, which will read the XML file with the animations
described earlier. The mesh is ready to be displayed and this is where OpenGL comes
into play.
5.1.3.2 Display Lists
Before starting to draw vertices, the 3D environment is prepared by calling projection
matrices; these take into account the specified Field of View, Max and Minimum
rendering distance. After that, lighting is created by enabling GL_LIGHTING (Fig. 30).
glShadeModel(GL_SMOOTH);
glEnable(GL_LIGHTING);
glEnable(GL_LIGHT0);
...
glLightModel(GL_LIGHT_MODEL_AMBIENT, {0.05f, 0.05f, 0.05f, 1f});

41
glLight(GL_LIGHT1, GL_POSITION, {1.5f, 1.5f, 1.5f, 1f});
...
glEnable(GL_CULL_FACE);
glCullFace(GL_BACK);
glEnable(GL_COLOR_MATERIAL);
In the above code smoothing is enabled to make the shadows less sharp, while
GL_CULL_FACE determines if some of the faces (back or front) have to be hidden. In
our case GL_BACK means all back faces aren’t shown. This is done since we do not
want to see faces that are at the back of the skin.
Triangles now have to be drawn. In fact most modelling software systems draw all
sort of surfaces using triangles. A display list is created for each part of the model;
here we use a list for the details that include eyes and hair, while the body list draws
the remaining skin. Inside the glNewList function we begin drawing triangles with
glBegin(GL_TRIANGLES) and iterate through the faces in the model. The assigned
colour, vertices and normals for each are extracted. These last two geometrical
components are represented as Vector3f (3 floating values) and then converted into
glVertex3f and glNormal3f respectively. Once all faces have been extracted the
list is ended and the process repeated for the next one.
However nothing is being displayed yet. Once all required lists have been loaded they
are called with glCallList(listName) and the display is updated. Subsequently
all employed lists are deleted and memory is freed to start the process again. This is
done for each frame of the clip. In addition, since the program allows the user to
change views, methods called iterateViews and enterFPV are also implemented.
The former changes both the position and rotation of the point of view and setting the
Field of View to 50. EnterFPV changes the position, rotation and Field of View to
90°, giving a first person view effect.
5.1.4 Summary
This was the most technically challenging section of the project, focusing entirely on
the creation of an animation engine. OpenGL could have been used to a higher extent,
since it uses GPU hardware acceleration to perform operations faster (e.g. matrix
calculations), therefore shortening loading times. However using native Java code to
implement all those tasks from scratch gave me a broader insight of the mathematics
involved in systems such as game engines and 3D software editors. Additionally, the
skills acquired will make it much easier to create a completely LWJGL-based engine
in future works.
Figure 30: OpenGL code to initialise lighting and rendering settings

42
5.2 Development Environment
The software was developed in native Java as well as LWJGL19
to access OpenGL’s
powerful graphical tools27
. All the coding was done in the NetBeans IDE29
and
multiple libraries were used including JFreeChart 30
, JVideoInput (a modified
OpenIMAJ library31
) and Slick32
to complement LWJGL’s. These external libraries
were necessary to implement the progress feedback using bar charts and the camera
module.
5.3 High Level Design
For this section and subchapter 5.4 Low Level Design, please refer to the Signer Class
Diagram in Section 10.5 in the Appendix. The Model View Controller design pattern
was determined to be best suited for this application since it allows a clear division
between user accessible and underlying components. Because the 3D animation is a
very complex package on its own, with this pattern we are able to easily access the
OpenGL features to be displayed on the user interface (View). A low granularity
diagram shows how the main packages interact with each other (Fig. 31).
5.3.1 Model
The Model contains all necessary information relating to User data and animation.
The package can directly interact with the 3D animation module by notifying changes
happening in the Controller. These changes are usually a result of user input, for
example clicking on a button to change point of view. In addition to holding User and
Animation data, the Model package partly contributes to displaying the User Interface
by representing the background of the application’s window system.
29
Oracle. (2014, December) NetBeans. [Online]. https://netbeans.org/
30
Object Refinery Ltd. (2015, February) JFreeChart. [Online]. http://www.jfree.org/jfreechart/
31
University of Southampton. (2015, February) OpenIMAJ. [Online]. http://www.openimaj.org/
32
Slick2D. (2014, November) Java Slick. [Online]. http://slick.ninjacave.com/
Figure 31: MVC design pattern on High Level. Solid arrows represent direct method calls, while
dashed arrows represent event notifications.

43
5.3.2 View
This package contains all the classes initialising UI components like buttons, labels,
etc. visible to the end user. The 3D animation will be displayed in those windows
where signing takes place; although it does appear on the front-end, the logic is
regulated by the Model package through the AnimationModule instance.
5.3.3 Controller
The controller is responsible for handling the actions of the User when interacting
with the UI elements such as buttons. It records the events and notifies the Model for
User data or animation changes. It also switches display windows by calling other
instances of view controllers.
5.4 Low Level Design
Here we will go into further detail of how the most important classes interact within
and between each other. Figure 32 shows an overview of all the existing packages
used in the application.
Not all packages have been analysed since some are not necessary to the
understanding of the program. Instead the main classes present in the Model, View
and Controller and Extra packages will be described and explained to gain an insight
of the system structure. The AvatarAnimation package was already discussed in the
3D Animation Module chapter of the report, and will therefore not be covered again.
Figure 32: Package class diagram giving an overview of the system. Some packages contain classes for multiple
objects like Extra, Buttons and Windows.

44
5.4.1 Model Classes
This package contains 2 classes:
• BgFrame: this class is the backbone containing the User and
AnimationModule objects. It also includes elements for the background,
banner and logo of the application windows.
• User: information about the user is stored in this class. User objects are
comprised of many variables describing username, password as well as other
data such as avatar preference and current skill level.
5.4.2 View Classes
This package contains all of the user interface objects necessary to use the program.
Each major component has a separate class, for example Lesson and Test are different
components. The classes in this package do not have any functional methods, except
for those initialising or modifying components. The classes are:
• LoginPage: this class initialises all components for the login window. It
therefore contains user login information (username, password) and signup
and login buttons.
• SignupDetailsPage: this class contains components for signing up, thus
includes many colour choice buttons as well as text fields for the user to
register (or modify) their details. Because signing up and editing details
require the same UI elements, the class is used to fulfil both functions.
• MainPage: this class displays the 3-column menu (as seen in Figure 13). It is
used to access everything in the application, from user settings, lessons, tests
and the progress page.
• LessonPage: here the components needed for the lesson layout are created and
initialised. These include a variety of buttons for navigation inside the 3D
viewport, as well as sound control and window navigation. Labels displaying
text for the signs and English text are also shown under the 3D viewport.
• TestPage: this class is slightly different from the others in the View package.
Although it is also only initialising and displaying UI elements, three if
statements are used to properly separate each test category and its required
components. For example if the currently selected test category is
Recognising, then only those elements needed for the recognition challenge
will be initialised; similarly for the other 2 challenges.
• ProgressPage: this class displays a bar chart with the user’s progress as read
from a file. It is the only class in the View package with other methods than
just component initialisers. It reads a text file called “progresses.txt” for the
user currently logged in. It then parses the data by getting every milestone’s
value and date. For example “Thu 2015-02-27 3 0” means the user became

45
level 3 on that specific Thursday. With the precise date information it will
compile the data to be displayed on the Bar chart. The page also displays the
last day of activity using the last known date from the file. JFreeChart30
was
used for the bar chart creation.
5.4.3 Controller Classes
These classes are linked to the correspondingly named equivalent in the view
package. When entered, their matching View class is called to display the window
elements. Each of these classes contains the functional methods and events to make
user interaction possible. They also control the animation module through the
BgFrame by notifying its thread.
• Login: this class contains the main method and is the starting point of the
program logic. It creates the BgFrame instance and through it, initialises the
animation process in the animation module. It also uses a text parser to load
user data from the file “users.txt” when the login button in the LoginPage is
clicked. User details are checked to see if an account for the entered username
exists; otherwise the user is asked to sign up.
• SigupDetails: if the user does not have an account they must sign up. This
class simply receives all the information from the SignupDetailsPage
components such as text fields and colour buttons. It then creates a new
account if the user is signing up, or updates current details if the user is editing
them.
• Main: once an account is created, or the user is logged in correctly, this class
calls the window elements in MainPage and waits for user input. Depending
on which buttons are pressed, the class will call the constructors for the chosen
section. For example clicking on a button under the Lesson category will call
the Lesson class, clicking the Progress button will call the Progress class and
so on.
• Lesson: when the user selects a lesson, this class reads lesson data from the
“lesson.txt” file. The file structure includes an integer determining the lesson
number, with each following line describing the sign, sentence and sound file
to be played. It then initialises the UI components through the LessonPage and
wakes up the animation thread. In fact, the animation thread is started in the
Login class when the program is first launched. It is then put in an idle state
waiting for user actions. Waking up the thread results in the animation file
(specified in the lesson.txt file) to be loaded and animations to be constructed.
Then the animation viewport is opened and the animation displayed and
looped.
• Test: this class, analogously to its View package equivalent, has more
methods than the others in this package. In fact it contains all of the button
events and other action methods for every test category. However only those
required for the currently displayed test are employed. Similarly to the Lesson
class, the file containing the test data is read and loaded into an array. The data

46
is then sent to the animation module and other elements like questions and
multiple choice answers are displayed on the screen.
• Progress: this very short class simply calls the constructor of ProgressPage to
initialise UI elements. It also contains methods to discard them and load the
previous Menu window.
5.4.4 Extra Classes
This package includes a number of classes very different from one another. They
mainly represent auxiliary objects employed by the program to fulfil different tasks.
• HideAndProgress: this special JFrame window represents a loading screen.
Its task is to appear in front of the animation viewport before the animation is
played. This is done because building an animation takes a few seconds, and
in that period the user would just be left waiting for something to happen.
With a loading screen we can give feedback to the user letting them know
approximately how much time is left with a progress bar and description of the
background process (Fig. 33).
• MP3: This class builds a sound object to be played back during animations. It
takes a file path and reads .mp3 files. Java includes a .wav player within its
built in libraries but .wav files are known to be very large compared to
compressed .mp3. Thus it was decided that using an extra library would save
more storage space than using many .wav files. In addition to saving space it is
also very easy to use. Each lesson animation has an audio file reading the
same sentence in English to the user. The MP3 module uses the JLayer33
library.
33
JavaZOOM. (2008, November) JLayer - MP3 Library. [Online].
http://www.javazoom.net/javalayer/javalayer.html
Figure 33: Progress window, an instance of HideAndProgress shows the background
process of the animation construction to the user.

Final Report - Moscholios

Recommended

Recommended

More Related Content

Similar to Final Report - Moscholios

Similar to Final Report - Moscholios (20)

Final Report - Moscholios