CRC Conference proceedings

Proceedings of the 2010 CRC PhD Student
Conference

Centre for Research in Computing
The Open University
Milton Keynes

June 3 and 4, 2010

Centre for Research in Computing
The Open University
Milton Keynes, UK

Conference organization:
Marian Petre
Robin Laney
Mathieu D’Aquin
Paul Piwek
Debbie Briggs

May 2010
Proceedings compiled by
Paul Piwek

Table of Contents

Mihhail Aizatulin Verifying Implementations of Security ......... 1
Protocols
in C

Simon Butler Analysing Semantic Networks of ......... 5
Identifier Names to Improve Source
Code Maintainability and Quality

Tom Collins Discovering Translational Patterns in ......... 9
Symbolic Representation of Music

Joe Corneli Semantic Adaptivity and Social ......... 12
Networking in Personal Learning
Networks

Richard Doust Investigating narrative “effects”: the ......... 15
case of suspense

Francois Verifying Authentication Properties of ......... 19
Dupressoir C Security Protocol Code Using
General Verifiers

Jennifer Ferreira Agile development and usability in ......... 23
practice: Work cultures of
engagement

Michael A Model Driven Architecture of Large ......... 26
Giddings Distributed Hard Real Time Systems

Alan Hayes An Investigation into Design ......... 30
Diagrams and Their Implementations

Robina An Investigation into Interoperability ......... 33
Hetherington of Data Between Software Packages
used to support the Design, Analysis
and Visualisation of Low Carbon
Buildings

Chris Ireland Understanding Object-Relational ......... 37
Impedance Mismatch: A Framework
Based Approach

Lukasz “Privacy Shake”, a Haptic Interface ......... 41
Jedrzejczyk for Managing Privacy Settings in
Mobile Location Sharing Applications

Stefan Designing a Climate Change Game for ......... 45
Kreitmayer Interactive Tabletops

Tamara Lopez Reasoning about Flaws in Software ......... 47
Design: Diagnosis and Recovery

Lin Ma Presupposition Analysis in ......... 51
Requirements

Lionel Montrieux Merging Verifiable and Evolving ......... 55
Access Control Properties

Sharon Moyo Effective Tutoring with Affective ......... 58
Embodied Conversational Agents

Brendan Murphy Evaluating a mobile learning ......... 60
environment in a home car domain

Tu Anh Nguyen Generating Accessible Natural ......... 65
Language Explanations for OWL
Ontologies

Chwhynny Supporting the Exploration of ......... 69
Overbeeke Research Spaces

Nadia Pantidi Understanding technology-rich ......... 74
learning spaces

Aleksandra How best to support scientific end- ......... 78
Pawlik user software development?

Brian Pluss Non-Cooperation in Computational ......... 82
Models of Dialogue

Ivana Quinto A Debate Dashboard to Support the ......... 86
Adoption of On-line Argument
Mapping Tools

Adam Rae Supporting multimodal media ......... 91
recommendation and annotation
using social network analysis

Rien Sach The effect of Feedback ......... 95

Stefan Using Business Process Security ......... 98
Taubenberger Requirements for IT Security Risk
Assessment

Keerthi Thomas Distilling Privacy Requirements for ......... 102
Mobile Applications

Min Q. Tran Understanding the Influence of 3D ......... 104
Virtual Worlds on Perceptions of 2D E-
commerce Websites

Thomas Daniel Supporting Reflection about Web ......... 108
Ullmann Resources within Mash-Up Learning
Environments

Rean van der Local civic governance using online ......... 110
Merwe media – a case of consensual problem
solving or a recalcitrant pluralism

Katie Wilkie Analysis of conceptual metaphors to ......... 114
inform music interaction designs

Anna Xambo Issues and techniques for collaborative ......... 118
music making on multi-touch surfaces

Saad Bin Saleem A Release Planning Model to Handle ......... 122
Security Requirements

2010 CRC PhD Student Conference

Verifying Implementations of Security Protocols in C
Mihhail Aizatulin
m.aizatulin@open.ac.uk

Supervisors Dr Andrew Gordon, adg@microsoft.com,
Dr Jan J¨rjens, jan.jurjens@cs.tu-dortmund.de,
u
Prof Bashar Nuseibeh, B.Nuseibeh@open.ac.uk
Department Computing
Status Full-time
Probation viva Passed
Starting date November 2008
Our goal is verification of cryptographic protocol implementations (such as
OpenSSL or Kerberos), motivated by the desire to minimise the gap between
verified and executable code. Very little has been done in this area. There are
numerous tools to find low-level bugs in code (such as buffer overflows and zero
division) and there are verifiers for cryptographic protocols that work on fairly
abstract descriptions, but so far very few attempts have been done to verify
cryptographic security directly on the code, especially for low-level languages
like C.
We attempt to verify the protocol code by extracting an abstract model that
can be used in high-level cryptographic verification tools such as ProVerif or
CryptoVerif. This is the first such approach that we are aware of. Currently we
investigate the feasibility of the approach by extracting the model from running
code, using the so called concolic (concrete + symbolic) execution. We run
the protocol implementation normally, but at the same time we record all the
operations performed on binary values and then replay those operations on
symbolic values. The resulting symbolic expressions reveal the structure of the
messages sent to the network and the conditions that are checked for incoming
messages.
We are able to produce symbolic execution traces for the handshake imple-
mented in the OpenSSL library. To give an example of what the extracted traces
look like, consider a simple request-response protocol, protected by hashing with
a shared key:
A → B : m|hash(‘request’|m, kAB ),
B → A : m |hash(‘response’|m|m , kAB ).
We implemented the protocol in about 600 lines of C code, calling to the OpenSSL
cryptographic library. Our concolic execution tool produces a trace of 8 lines

Page 1 of 125


write(i39)
payload1 = payload()
key2 = key()
write(i14|7c|payload1|HMAC(sha1, i7|7c52657175657374|payload1, key2))
msg3 = read()
var4 = msg3{5,23}
branchF((memcmp(msg3{28,20},
HMAC(sha1, i8|7c526573706f6e7365|i14|7c|payload1|var4, key2)) != i0))
accept(var4)

Figure 1: An excerpt from the symbolic client trace. X{start, len} denotes
the substring of X starting at start of length len. iN is an integer with value N
(width information is omitted), and branchT and branchF are the true or false
branches taken by the code.

for the client side shown in figure 1: we see the client sending the request and
checking the condition on the server response before accepting it.
We are currently working to implement symbolic handling of buffer lengths
and sound handling of loops as well as making the extracted models compatible
with those understood by ProVerif and CryptoVerif, in particular simplifying
away any remaining arithmetic expressions from the symbolic trace.
One obvious drawback of concolic execution is that it only follows the single
path that was actually taken by the code. This is enough to produce an accurate
model when there is only one main path, however, libraries like OpenSSL contain
multiple nontrivial paths. Thus, to achieve verification of those libraries, we
plan to move the analysis towards being fully static in future.

Related Work One of the earliest security verification attempts directly
on code is probably CSur [Goubault-Larrecq and Parrennes, 2005] that deals
directly with C protocol implementations. It translates programs into a set
of Horn clauses that are fed directly into a general purpose theorem prover.
Unfortunately, it never went beyond some very simple implementations and has
not been developed since.
The work [J¨rjens, 2006] describes an approach of translating Java programs
u
in a manner similar to above. In our work we try to separate reasoning about
pointers and integers from reasoning about cryptography, in hope to achieve
greater scalability.
Some work has been done on verification of functional language implementa-
tions, either by translating the programs directly into π-calculus [Bhargavan et
al., 2006; Bhargavan et al., 2008] or by designing a type system that enforces
security [Bengtson et al., 2008]. Unfortunately, it is not trivial to adapt such
approaches to C-like languages.
ASPIER [Chaki and Datta, 2008] is using model checking for verification and
has been applied to OpenSSL. However, it does not truly start from C code: any
code explicitly dealing with pointers needs to be replaced by abstract summaries

Page 2 of 125


that presumably have to be written manually.
Concolic execution is widely used to drive automatic test generation, like in
[Cadar et al., 2008] or [Godefroid et al., 2008]. One difference in our concolic
execution is that we need to assign symbols to whole bitstrings, whereas the
testing frameworks usually assign symbols to single bytes. We believe that our
work could be adapted for testing of cryptographic software. Usual testing
approaches try to create an input that satisfies a set of equations resulting from
checks in code. In presence of cryptography such equations will (hopefully) be
impossible to solve, so a more abstract model like ours might be useful.
A separate line of work deals with reconstruction of protocol message formats
from implementation binaries [Caballero et al., 2007; Lin et al., 2008; Wondracek
et al., 2008; Cui et al., 2008; Wang et al., 2009]. The goal is typically to
reconstruct field boundaries of a single message by observing how the binary
processes the message. Our premises and goals are different: we have the
advantage of starting from the source code, but in exchange we aim to reconstruct
the whole protocol flow instead of just a single message. Our reconstruction
needs to be sound to enable verification — all possible protocol flows should be
accounted for.

References
[Bengtson et al., 2008] Jesper Bengtson, Karthikeyan Bhargavan, C´dric Four-
e
net, Andrew D. Gordon, and Sergio Maffeis. Refinement types for secure
implementations. In CSF ’08: Proceedings of the 2008 21st IEEE Computer
Security Foundations Symposium, pages 17–32, Washington, DC, USA, 2008.
IEEE Computer Society.
[Bhargavan et al., 2006] Karthikeyan Bhargavan, C´dric Fournet, Andrew D.
e
Gordon, and Stephen Tse. Verified interoperable implementations of security
protocols. In CSFW ’06: Proceedings of the 19th IEEE workshop on Computer
Security Foundations, pages 139–152, Washington, DC, USA, 2006. IEEE
Computer Society.
[Bhargavan et al., 2008] Karthikeyan Bhargavan, C´dric Fournet, Ricardo Corin,
e
and Eugen Zalinescu. Cryptographically verified implementations for TLS.
In CCS ’08: Proceedings of the 15th ACM conference on Computer and
communications security, pages 459–468, New York, NY, USA, 2008. ACM.
[Caballero et al., 2007] Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn
Song. Polyglot: automatic extraction of protocol message format using
dynamic binary analysis. In CCS ’07: Proceedings of the 14th ACM conference
on Computer and communications security, pages 317–329, New York, NY,
USA, 2007. ACM.
[Cadar et al., 2008] Cristian Cadar, Daniel Dunbar, and Dawson Engler. Klee:
Unassisted and automatic generation of high-coverage tests for complex sys-

Page 3 of 125


tems programs. In USENIX Symposium on Operating Systems Design and
Implementation (OSDI 2008), San Diego, CA, december 2008.
[Chaki and Datta, 2008] Sagar Chaki and Anupam Datta. Aspier: An auto-
mated framework for verifying security protocol implementations. Technical
Report 08-012, Carnegie Mellon University, October 2008.
[Cui et al., 2008] Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, and
Luis Irun-Briz. Tupni: automatic reverse engineering of input formats. In CCS
’08: Proceedings of the 15th ACM conference on Computer and communications
security, pages 391–402, New York, NY, USA, 2008. ACM.
[DBL, 2008] Proceedings of the Network and Distributed System Security Sympo-
sium, NDSS 2008, San Diego, California, USA, 10th February - 13th February
2008. The Internet Society, 2008.
[Godefroid et al., 2008] Patrice Godefroid, Michael Y. Levin, and David A. Mol-
nar. Automated whitebox fuzz testing. In NDSS [2008].
[Goubault-Larrecq and Parrennes, 2005] J. Goubault-Larrecq and F. Parrennes.
Cryptographic protocol analysis on real C code. In Proceedings of the 6th
International Conference on Veriﬁcation, Model Checking and Abstract Inter-
pretation (VMCAI’05), volume 3385 of Lecture Notes in Computer Science,
pages 363–379. Springer, 2005.
[J¨rjens, 2006] Jan J¨ rjens. Security analysis of crypto-based Java programs
u u
using automated theorem provers. In ASE ’06: Proceedings of the 21st
IEEE/ACM International Conference on Automated Software Engineering,
pages 167–176, Washington, DC, USA, 2006. IEEE Computer Society.
[Lin et al., 2008] Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang.
Automatic protocol format reverse engineering through context-aware moni-
tored execution. In NDSS [2008].
[Wang et al., 2009] Zhi Wang, Xuxian Jiang, Weidong Cui, Xinyuan Wang, and
Mike Grace. Reformat: Automatic reverse engineering of encrypted messages.
In Michael Backes and Peng Ning, editors, ESORICS, volume 5789 of Lecture
Notes in Computer Science, pages 200–215. Springer, 2009.
[Wondracek et al., 2008] Gilbert Wondracek, Paolo Milani Comparetti, Christo-
pher Kruegel, and Engin Kirda. Automatic Network Protocol Analysis. In
15th Symposium on Network and Distributed System Security (NDSS), 2008.

Page 4 of 125


Analysing semantic networks of
identifier names to improve source code
maintainability and quality
Simon Butler
sjb792@student.open.ac.uk

Supervisors Michel Wermelinger, Yijun Yu & Helen Sharp
Department/Institute Centre for Research in Computing
Status Part-time
Probation viva After
Starting date October 2008

Source code is the written expression of a software design consisting of identifier
names – natural language phrases that represent concepts being manipulated
by the program – embedded in a framework of keywords and operators provided
by the programming language. Identifiers are crucial for program comprehen-
sion [9], a necessary activity in the development and maintenance of software.
Despite their importance, there is little understanding of the relationship be-
tween identifier names and source code quality and maintainability. Neither is
there automated support for identifier management or the selection of relevant
natural language content for identifiers during software development.
We will extend current understanding of the relationship between identifier
name quality and source code quality and maintainability by developing tech-
niques to analyse identifiers for meaning, modelling the semantic relationships
between identifiers and empirically validating the models against measures of
maintainability and software quality. We will also apply the analysis and mod-
elling techniques in a tool to support the selection and management of identifier
names during software development, and concept identification and location for
program comprehension.
The consistent use of clear identifier names is known to aid program com-
prehension [4, 7, 8]. However, despite the advice given in programming conven-
tions and the popular programming literature on the use of meaningful identifier
names in source code, the reality is that identifier names are not always meaning-
ful, may be selected in an ad hoc manner, and do not always follow conventions
[5, 1, 2].
Researchers in the reverse engineering community have constructed mod-
els to support program comprehension. The models range in complexity from
textual search systems [11], to RDF-OWL ontologies created either solely from
source code and identifier names [8], or with the inclusion of supporting doc-
umentation and source code comments [13]. The ontologies typically focus on

Page 5 of 125


class and method names, and are used for concept identification and location
based on the lexical similarity of identifier names. The approach, however, does
not directly address the quality of identifier names used.
The development of detailed identifier name analysis has focused on method
names because their visibility and reuse in APIs implies a greater need for them
to contain clear information about their purpose [10]. Caprile and Tonella [3]
derived both a grammar and vocabulary for C function identifiers, sufficient
for the implementation of automated name refactoring. Høst and Østvold [5]
have since analysed Java method names looking for a common vocabulary that
could form the basis of a naming scheme for Java methods. Their analysis of
the method names used in multiple Java projects found common grammatical
forms; however, there were sufficient degenerate forms for them to be unable to
derive a grammar for Java method names.
The consequences of identifier naming problems have been considered to be
largely confined to the domain of program comprehension. However, Deißenb¨ck o
and Pizka observed an improvement in maintainability when their rules of con-
cise and consistent naming were applied to a project [4], and our recent work
found statistical associations between identifier name quality and source code
quality [1, 2]. Our studies, however, only looked at the construction of the
identifier names in isolation, and not at the relationships between the meaning
of the natural language content of the identifiers. We hypothesise that a rela-
tionship exists between the quality of identifier names, in terms of their natural
language content and semantic relationships, and the quality of source code,
which can be understood in terms of the functionality, reliability, and usability
of the resulting software, and its maintainability [6]. Accordingly, we seek to
answer the following research question:

How are the semantic relationships between identifier names, in-
ferred from their natural language content and programming lan-
guage structure, related to source code maintainability and quality?

We will construct models of source code as semantic networks predicated
on both the semantic content of identifier names and the relationships between
identifier names inferred from the programming language structure. For exam-
ple, the simple class Car in Figure 1 may be represented by the semantic network
in Figure 2. Such models can be applied to support empirical investigations of
the relationship between identifier name quality and source code quality and
maintainability. The models may also be used in tools to support the manage-
ment and selection of identifier names during software development, and to aid
concept identification and location during source code maintenance.

public c l a s s Car extends V e h i c l e {
Engine e n g i n e ;
}

Figure 1: The class Car

We will analyse identifier names mined from open source Java projects to
create a catalogue of identifier structures to understand the mechanisms em-
ployed by developers to encode domain information in identifiers. We will build

Page 6 of 125


on the existing analyses of C function and Java method identifier names [3, 5, 8],
and anticipate the need to develop additional techniques to analyse identifiers,
particularly variable identifier names.

extends
Car Vehicle

has a

has instance named
Engine engine

Figure 2: A semantic network of the class Car

Modelling of both the structural and semantic relationships between iden-
tifiers can be accomplished using Gellish [12], an extensible controlled natural
language with dictionaries for natural languages – Gellish English being the
variant for the English language. Unlike a conventional dictionary, a Gellish
dictionary includes human- and machine-readable links between entries to de-
fine relationships between concepts – thus making Gellish a semantic network –
and to show hierarchical linguistic relationships such as meronymy, an entity–
component relationship. Gellish dictionaries also permit the creation of multiple
conceptual links for individual entries to define polysemic senses.
The natural language relationships catalogued in Gellish can be applied to
establish whether the structural relationship between two identifiers implied by
the programming language is consistent with the conventional meaning of the
natural language found in the identifier names. For example, a field is implic-
itly a component of the containing class allowing the inference of a conceptual
and linguistic relationship between class and field identifier names. Any incon-
sistency between the two relationships could indicate potential problems with
either the design or with the natural language content of the identifier names.
We have assumed a model of source code development and comprehension
predicated on the idea that it is advantageous for coherent and relevant semantic
relationships to exist between identifier names based on their natural language
content. To assess the relevance of our model to real-world source code we
will validate the underlying assumption empirically. We intend to mine both
software repositories and defect reporting systems to identify source code impli-
cated in defect reports and evaluate the source code in terms of the coherence
and consistency of models of its identifiers. To assess maintainability we will
investigate how source code implicated in defect reports develops in successive
versions – e.g. is the code a continuing source of defects? – and monitor areas of
source code modified between versions to determine how well our model predicts
defect-prone and defect-free regions of source code.
We will apply the results of our research to develop a tool to support the
selection and management of identifier names during software development, as
well as modelling source code to support software maintenance. We will evaluate
and validate the tool with software developers – both industry partners and
FLOSS developers – to establish the value of identifier naming support. While
intended for software developers, the visualisations of source code presented by

Page 7 of 125


the tool will enable stakeholders (e.g. domain experts) who are not literate
in programming or modelling languages (like Java and UML) to examine, and
feedback on, the representation of domain concepts in source code.

References
[1] S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Relating identifier naming
flaws and code quality: an empirical study. In Proc. of the Working Conf.
on Reverse Engineering, pages 31–35. IEEE Computer Society, 2009.
[2] S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Exploring the influence
of identifier names on code quality: an empirical study. In Proc. of the
14th European Conf. on Software Maintenance and Reengineering, pages
159–168. IEEE Computer Society, 2010.
[3] B. Caprile and P. Tonella. Restructuring program identifier names. In
Proc. Int’l Conf. on Software Maintenance, pages 97–107. IEEE, 2000.
[4] F. Deißenb¨ck and M. Pizka. Concise and consistent naming. Software
o
Quality Journal, 14(3):261–282, Sep 2006.
[5] E. W. Høst and B. M. Østvold. The Java programmer’s phrase book.
In Software Language Engineering, volume 5452 of LNCS, pages 322–341.
Springer, 2008.
[6] International Standards Organisation. ISO/IEC 9126-1: Software engineer-
ing – product quality, 2001.
[7] D. Lawrie, H. Feild, and D. Binkley. An empirical study of rules for well-
formed identifiers. Journal of Software Maintenance and Evolution: Re-
search and Practice, 19(4):205–229, 2007.
[8] D. Ratiu. Intentional Meaning of Programs. PhD thesis, Technische Uni-
¸
versit¨t M¨nchen, 2009.
a u
[9] V. Rajlich and N. Wilde. The role of concepts in program comprehension.
In Proc. 10th Int’l Workshop on Program Comprehension, pages 271–278.
IEEE, 2002.
[10] M. Robillard. What makes APIs hard to learn? Answers from developers.
IEEE Software, 26(6):27–34, Nov.-Dec. 2009.
[11] G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. Identifying word
relations in software: a comparative study of semantic similarity tools. In
Proc Int’l Conf. on Program Comprehension, pages 123–132. IEEE, June
2008.
[12] A. S. H. P. van Renssen. Gellish: a generic extensible ontological language.
Delft University Press, 2005.
[13] R. Witte, Y. Zhang, and J. Rilling. Empowering software maintainers with
semantic web technologies. In European Semantic Web Conf., pages 37–52,
2007.

Page 8 of 125


Discovering translational patterns
in symbolic representations of music

Tom Collins
http://users.mct.open.ac.uk/tec69

Supervisors Robin Laney
Alistair Willis
Paul Garthwaite
Department/Institute Centre for Research in Computing
Status Fulltime

RESEARCH QUESTION

How can current methods for pattern discovery in music be improved and integrated
into an automated composition system?

The presentation will address the first half of this research question: how can current
methods for pattern discovery in music be improved?

INTRA-OPUS PATTERN DISCOVERY

Suppose that you wish to get to know a particular piece of music, and that you have a
copy of the score of the piece or a MIDI file. (Scores and MIDI files are symbolic
representations of music and are the focus of my presentation, as opposed to sound
recordings.) Typically, to become familiar with a piece, one listens to the MIDI file or
studies/plays through the score, gaining an appreciation of where and how material is
repeated, and perhaps also gaining an appreciation of the underlying structure.

The literature contains several algorithmic approaches to this task, referred to as
‘intra-opus’ pattern discovery [2, 4, 5]. Given a piece of music in a symbolic
representation, the aim is to define and evaluate an algorithm that discovers and
returns patterns occurring within the piece. Some potential applications for such an
algorithm are as follows:

• A pattern discovery tool to aid music students.
• Comparing an algorithm’s discoveries with those of a music expert as a means
of investigating human perception of music.
• Stylistic composition (the process of writing in the style of another composer
or period) assisted by using the patterns/structure returned by a pattern
discovery algorithm [1, 3].

Page 9 of 125


TWO IMPROVEMENTS

Current methods for pattern discovery in music can be improved in two ways:

1. The way in which the algorithm’s discoveries are displayed for a user can be
improved.
2. A new algorithm can be said to improve upon existing algorithms if, according
to standard metrics, it is the strongest-performing algorithm on a certain task.

Addressing the first area for improvement, suppose that an algorithm has discovered
hundreds of patterns within a piece of music. Now these must be presented to the
user, but in what order? Various formulae have been proposed for rating a discovered
pattern, based on variables that quantify attributes of that pattern and the piece of
music in which it appears [2, 4]. To my knowledge, none have been derived or
validated empirically. So I conducted a study in which music undergraduates
examined excerpts taken from Chopin’s mazurkas and were instructed to rate already-
discovered patterns, giving high ratings to patterns that they thought were noticeable
and/or important. A model useful for relating participants’ ratings to the attributes was
determined using variable selection and cross-validation. This model leads to a new
formula for rating discovered patterns, and the basis for this formula constitutes a
methodological improvement.

Addressing the second area for improvement, I asked a music analyst to analyse two
sonatas by Domenico Scarlatti and two preludes by Johann Sebastian Bach. The brief
was similar to the intra-opus discovery task described above: given a piece of music
in staff notation, discover translational patterns that occur within the piece. Thus, a
benchmark of translational patterns was formed for each piece, the criteria for
benchmark membership being left largely to the analyst’s discretion. Three
algorithms—SIA [5], COSIATEC [4] and my own, SIACT—were run on the same
pieces and their performance was evaluated in terms of recall and precision. If an
algorithm discovers x of the y patterns discovered by the analyst then its recall is x/y.
If the algorithm also returns z patterns that are not in the analyst’s benchmark then the
algorithm’s precision is x/(x + z). It was found that my algorithm, SIACT, out-
performs the existing algorithms with regard to recall and, more often than not,
precision.

My presentation will give the definition of a translational pattern, discuss the
improvements outlined above, and demonstrate how these improvements are being
brought together in a user interface.

SELECTED REFERENCES

1. Collins, T., R. Laney, A. Willis, and P.H. Garthwaite, ‘Using discovered,
polyphonic patterns to filter computer-generated music’, in Proceedings of the
International Conference on Computational Creativity, Lisbon (2010), 1-10.

2. Conklin, D., and M. Bergeron, ‘Feature set patterns in music’, in Computer Music
Journal 32(1) (2008), 60-70.

Page 10 of 125


3. Cope, D., Computational models of musical creativity (Cambridge Massachusetts:
MIT Press, 2005).

4. Meredith, D., K. Lemström, and G.A. Wiggins, ‘Algorithms for discovering
repeated patterns in multidimensional representations of polyphonic music’, in
Cambridge Music Processing Colloquium, Cambridge (2003), 11 pages.

5. Meredith, D., K. Lemström, and G.A. Wiggins, ‘Algorithms for discovering
repeated patterns in multidimensional representations of polyphonic music’, in
Journal of New Music Research 31(4) (2002), 321-345.

Page 11 of 125


Semantic Adaptivity and Social Networking in Personal
Learning Environments

Joe Corneli
j.a.corneli@open.ac.uk

Supervisors Alexander Mikroyannidis
Peter Scott
Department/Institute Knowledge Media Institute
Status Fulltime
Probation viva Before
Starting date 01/01/10

Introductory Remarks

I've decided to deal with "personal learning environments" with an eye towards the
context of their creation and use. This entails looking not just at ways to help support
learning experiences, but also at the complex of experiences and behaviours of the
many stakeholders who are concerned with learning. (E.g. educators, content
providers, software developers, institutional and governmental organizations.)

This broad view is compatible with the idea of a personal learning environment put
forward by the progenitors of the PLE model: "Rather than integrate tools within a
single context, the system should focus instead on coordinating connections between
the user and a wide range of services offered by organizations and other individuals."
(Wilson et al., 2006)

This problem area, which otherwise threatens to become hugely expansive, invites the
creation of a unified methodology and mode of analysis. A key aim of my work is
to develop such a method -- a sort of dynamic cartography. In this frame, the social
roles of stakeholders are to be understood through their constituent actions.

My analysis will then focus on the following question: How can mapping activity
patterns in a social context help us support the learning process more effectively?

Thematic Issues

In order to understand patterns of interaction with data well enough to make useful
maps, we must delve a bit into human sense-making behaviour. A small vocabulary
of actions related to sense-making provides a model we can then use quite
extensively.

People look for simplifying patterns. In a countervailing trend, they look for ways to
become more usefully interconnected and interoperable. To negotiate between
these two types of behaviour, they identify or create "points of coordination" which
provide mechanisms of control. They may do experiments, and then document how

Page 12 of 125


these mechanisms generate effects in a more or less predictable way. Finally, they
developing explicit, shareable, practices which achieve "desirable" effects.

Simplification, interconnection, control, experiment, motivation, and praxis -- these
are the thematic issues that inform my technical investigations.

Proposed Implementation Work

I plan to focus on implementation is that it is an ideal place in which to refine and test
my ideas about dynamic maps. My efforts will be directed largely into
implementation in the following applications.

* Etherpad and other related tools for live online interactions --

Data about social interactions is all interesting and potentially useful, but data about
"live" social interactions is becoming increasingly available in forms that are suitable
for large-scale computational analysis, and real-time use.

* RDF and related techniques for data management --

Marking up complex and changing relationships between objects is standard in e.g.
computer animation and computer games; it is interesting to think about how these
ideas can work in other domains (e.g. to assist with learning).

* Wordnet and Latent Semantic Analysis style approaches for clustering and
annotating data --

There are various techniques for dividing content into thematic clusters (useful for
supporting simplification behaviours needed for sense making), and for annotating
data with new relationships (useful for supporting interconnection behaviours). I will
explore these in various applications, e.g. applying them to the streams of data
identified above.

* Semantic Web style patterns for interoperability --

Content may still be king, but interfaces make up the board on which the game is
played. I plan to use an existing standard for mathematical documents (OMDoc) and
other API-building tools to help make the PlanetMath.org collection of mathematical
resources interoperable with e.g. OU's SocialLearn platform, contributing to the
development of a public service to STEM learners and practitioners worldwide.

* Documentation of technical processes --

PlanetMath.org is an example of a tool that has more content contributors than coders,
and more feature requests than anyone knows what to do with. Good documentation
is part of making hacking easier. Towards this end, I'm planning to build
PlanetComputing.org to document the software used on PlanetMath (and many other
projects).

Conclusion

Page 13 of 125


By the end of my Ph. D. project, I hope to have built a "PLE IDE" -- a tool offering
personalized support for both learners and developers. I hope to have a robust theory
and practice of dynamical mapping that I will have tested out in several domains
related to online learning.

Reference

Wilson, S., Liber, O., Johnson, M., Beauvoir, P., Sharples, P., & Milligan, C. (2006).
Personal Learning Environments: Challenging The Dominant Design Of Educational
Systems. Proceedings of 2nd International Workshop on Learner-Oriented
Knowledge Management and KM-Oriented Learning, In Conjunction With ECTEL
06. (pp. 67-76), Crete, Greece.

Page 14 of 125


Investigating narrative ‘effects’: the case of suspense
Richard Doust, richard.doust@free.fr

Supervisors Richard Power, Paul Piwek
Department/Institute Computing
Status Part-time

1 Introduction
Just how do narrative structures such as a Hitchcock film generate the well-known feeling known as suspense ? Our
goal is to investigate the structures of narratives that produce various narrative effects such as suspense, curiosity,
surprise. The fundamental question guiding this research could be phrased thus:
What are the minimal requirements on formal descriptions of narratives such that we can capture these
phenomena and generate new narratives which contain them ?
Clearly, the above phenomena may depend also on extra-narrative features such as music, filming angles, and so
on. These will not be our primary concern here. Our approach consists of two main parts:
1. We present a simple method for defining a Storybase which for our purposes will serve to produce different
‘tellings’ of the same story on which we can test our suspense modelling.
2. We present a formal approach to generating the understanding of the story as it is told, and then use the
output of this approach to suggest an algorithm for measuring the suspense level of a given telling of a story.
We can thus compare different tellings of a story and suggest which ones will have high suspense, and which
ones low.

2 Suspense
2.1 Existing definitions
Dictionary definitions of the word ’suspense’ suggest that there really ought to be several different words for what
is more like a concept cluster than a single concept. The Collins English dictionary gives three definitions:
1. apprehension about what is going to happen. . .
2. an uncertain cognitive state; "the matter remained in suspense for several years" . . .
3. excited anticipation of an approaching climax; "the play kept the audience in suspense" anticipation, ex-
pectancy - an expectation.
Gerrig and Bernardo (1994) suggest that reading fiction involves constantly looking for solutions to the plot-based
dilemmas faced by the characters in a story world. One of the suggestions which come out of this work is that
suspense is greater the lower the number of solutions to the hero’s current problem that can be found by the reader.
Cheong and Young’s (2006) narrative generating system uses the idea that a reader’s suspense level depends on
the number and type of solutions she can imagine in order to solve the problems facing the narrative’s preferred
character.
Generally, it seems that more overarching and precise definitions of suspense are wanting in order to connect
some of the above approaches. The point of view we will assume is that the principles by which literary narratives
are designed are obscured by the lack of sufficiently analytical concepts to define them. We will use as our starting
point work on stories by Brewer and Lichtenstein (1981) which seems fruitful in that it proposes not only a view of
suspense, but also of related narrative phenomena such as surprise and curiosity.
Page 15 of 125
1


2.2 Brewer and Lichtenstein’s approach
In Brewer and Lichtenstein (1981) propose that there are three major discourse structures which account for the
enjoyment of a large number of stories: surprise, curiosity and suspense. For suspense, there must be an initiating
event which could lead to significant consequences for one of the characters in the narrative. This event leads to
the reader feeling concern about the outcome for this character, and if this state is maintained over time, then the
reader will feel suspense. As Brewer and Lichtenstein say, often ‘additional discourse material is placed between
the initiating event and the outcome event, to encourage the build up of suspense’ (Brewer and Lichtenstein, 1981,
p.17).
Much of the current work can be seen as an attempt to formalise and make robust the notions of narrative
understanding that Brewer laid out. We will try to suggest a model of suspense which explains, for example, how
the placing of additional material between the initiating event and the outcome event increases the suspense felt in
a given narrative. We will also suggest ways in which curiosity and surprise could be formally linked to suspense.
We also hope that our approach will be able to shed some light on the techniques for creating suspense presented
in writer’s manuals.

3 The storybase
3.1 Event structure perception
Our starting point for analysing story structure is a list of (verbally described) story events. Some recent studies
(Speer, 2007) claim that people break narratives down into digestible chunks in this way. If this is the case, then
there should expect to discover commonalities between different types of narrative (literature, film, storytelling)
especially as regards phenomena such as suspense. One goal of this work is to discover just these commonalities.

3.2 Storybase : from which we can talk about variants of the ’same’ story.
One of the key points that Brewer and Lichtenstein make is that the phenomena of suspense depends on the order
in which information about the story is released, as well as on which information is released and which withheld.
One might expect, following this account, that telling ‘the same story’ in two different ways might produce different
levels of suspense.
In order to be able to test different tellings of the same story, we define the notion of a STORYBASE. This
should consist of a set of events, together with some constraints on the set. Any telling of the events which obeys
these constraints should be recognised by most listeners as being ‘the same story’. We define four types of link
between the members of the set of possible events:
• Starting points, Event links, Causal constraints, Stopping points.
The causal constraints can be positive or negative. They define, for example, which events need to have been
told for others to now be able to be told. Our approach can be seen as a kind of specialised story-grammar for
a particular story. The grammar generates ‘sentences’, and each ‘sentence’ is a different telling of the story. The
approach is different to story schemas. We are not trying to encode information about the world at this stage, any
story form is possible. With this grammar, we can generate potentially all of the possible tellings of a given story
which are recognisably the same story, and in this way, we can test our heuristics for meta-effects such as suspense
on a whole body of stories.

4 Inference
4.1 Inference types
To model the inferential processes which go on when we listen to or read a story, or watch a film, we define three
types of inference:
1. Inference of basic events from sensory input : a perceived action in the narrative together with an ‘event
classifier module’ produces a list of ordered events.
2. Inferences about the current state of the story (or deductions).
3. Inferences about the future state of the story (or predictions).
Page 16 of 125


Clearly these inferential processes also rely on general knowledge about about the world or the story domain, and
even about stories themselves.
So, for each new story event we build up a set of inferences STORYSOFAR of these three types. At each new
story event, new inferences are generated and old inferences rejected. There is a constant process of maintenance
of the logical coherence of the set of inferences as the story is told. To model this formally, we create a set of
‘inferential triples’ of the form: “if X and Y then Z” or X.Y->Z, where X, Y, and Z are Deductions or Predictions.

5 Measuring suspense
5.1 A ‘suspense-grammar’ on top of the storybase
To try to capture phenomena such as suspense, curiosity and surprise, we aim to create and test different algorithms
which take as their input the generated story, together with the inferences generated by the triples mentioned above.
A strong feature of this approach is that we can test our algorithms on a set of very closely related stories which
have been generated automatically.

5.2 Modelling conflicting predictions
Our current model of suspense is based on the existence of conflicting predictions with high salience. (This notion
of the salience of a predicted conflict could be defined in terms of the degree to which whole sets of following
predictions for the characters in the narrative are liable to change. For the moment, intuitively, it relates to how
the whole story might ‘flow’ in a different direction.) For the story domain, we construct the set INCOMP of pairs
of mutually conflicting predictions with a given salience:
INCOMP = { (P1,NotP1,Salience1), (P2,NotP2,Salience2), . . . }
We can now describe a method for modelling the conflicting predictions triggered by a narrative. If at time T, P1
and NotP1 are members of STORYSOFAR, then we have found two incompatible predictions in our ‘story-so-far’.

5.3 The predictive chain
We need one further definition in order to be able to define our current suspense measure for a story. For a given
prediction P1, we (recursively) define the ’prediction chain’ function C of P1:
C(P1) is the set of all predicted events P such that P.y -> P’ where P’ is a member of C(P1) for some
y.

5.4 Distributing salience as a rough heuristic for modelling suspense in a narrative
Suppose we have a predicted conflict between predictionA and predictionB which has a salience of 10. In these
circumstances, it would seem natural to ascribe the salience of 5 to each of the (at least) two predicted events
predictionA and predictionB which produce the conflict. Now suppose that leading back from predictionA there is
another predictionC that needs to be satisfied for the predictionA to occur. How do we spread out the salience of
the conflict over these different predicted events ?

5.5 A ’thermodynamic’ heuristic for creating a suspense measure
A predicted incompatibility as described above triggers the creation of CC(P1,P2,Z), the set of two causal chains
C(P1) and C(P2) which lead up to these incompatible predictions. Now, we have :
CC(P1,P2,Z) = C(P1) + C(P2)
To determine our suspense heuristic, we first find the size L of CC(P1,P2,Z). And at each story step we define the
suspense level S in relation to the conflicting predictions P1 and P2 as S = Z / L. Intuitively, one might say that
the salience of the predicted incompatibility is ’spread over’ or distributed over the relevant predictions that lead up
to it. We can call this a ‘thermodynamic’ model because it is as if the salience or ‘heat’ of one predicted conflicting
moment is transmitted back down the predictive line to the present moment. All events which could have a bearing
on any of the predictions in the chain are for this reason subject to extra attention.

Page 17 of 125


If the set of predictions stays the same over a series of story steps, and in a first approximation, we assume that
the suspensefulness of a narrative is equivalent to the sum of the suspense level of each story step, then we can say
that the narrative in question will have a total suspense level S-total relative to this particular predicted conflict of
S-total = Z/L + Z/(L-1) + Z/(L-2) + . . . + Z/L
as the number of predictions in CC(P1,P2,Z) decreases each time a prediction is either confirmed or annulled. To
resume we can a working definition of suspense as follows:

5.6 Definition of suspense
Definition : the suspense level of a narrative depends on the salience of predicted con-
flicts between two or more possible outcomes and on the amount of story time that these
predicted conflicts remain unresolved and ‘active’.

From this definition of suspense we would expect two results:
1. the suspense level at a given story step will increase as the number of predictions necessary to be confirmed
leading up to the conflict decreases, and
2. the way to maximise suspense in a narrative is for the narrative to ‘keep active’ predicted incompatibilities
with a high salience over several story steps.
In fact, this may be just how suspenseful narratives work. One might say,
suspenseful narratives engineer a spreading of the salience of key moments backwards in
time, thus maintaining a kind of tension over sufficiently long periods for emotional effects
to build up in the spectator.

6 Summary
We make two claims:
1. The notion of a storybase is a simple and powerful to generate variants of the same story.
2. Meta-effects of narrative can be tested by using formal algorithms on these story variants. These algorithms
build on modelling of inferential processes and knowledge about the world.

7 References
• Brewer, W. F. (1996). The nature of narrative suspense and the problem of rereading. In P. Vorderer,
H. J. Wulff, and M. Friedrichsen (Eds.), Suspense: Conceptualizations, theoretical analyses, and empirical
explorations. Mahwah, NJ: Lawrence Erlbaum Associates. 107-127.
• Brewer, W.F., and Lichtenstein, E. H. (1981). Event schemas, story schemas, and story grammars. In J.
Long and A. Baddeley (Eds.), Attention and Performance IX. Hillsdale, NJ: Lawrence Erlbaum Associates.
363-379.
• Cheong, Y.G. and Young, R.M. 2006. A Computational Model of Narrative Generation for Suspense. In
Computational Aesthetics: Artificial Intelligence Approaches to Beauty and Happiness: Papers from the 2006
AAAI Workshop, ed. Hugo Liu and Rada Mihalcea, Technical Report WS-06-04. American Association for
Artificial Intelligence, Menlo Park, California, USA, pp. 8- 15.

• Gerrig R.J., Bernardo A.B.I. Readers as problem-solvers in the experience of suspense (1994) Poetics, 22 (6), pp. 459-
472.
• Speer, N. K., Zacks, J. M., & Reynolds, J. R. (2007). Human brain activity time-locked to narrative event
boundaries. Psychological Science, 18, 449-455.

Page 18 of 125


Verifying Authentication Properties of C Security
Protocol Code Using General Verifiers
Fran¸ois Dupressoir
c

Supervisors Andy Gordon (MSR)
Jan J¨rjens (TU Dortmund)
u
Bashar Nuseibeh (Open University)
Department Computing
Registration Full-Time
Probation Passed

1 Introduction
Directly verifying security protocol code could help prevent major security flaws
in communication systems. C is usually used when implementing security soft-
ware (e.g. OpenSSL, cryptlib, PolarSSL...) because it provides control over
side-channels, performance, and portability all at once, along with being easy
to call from a variety of other languages. But those strengths also make it hard
to reason about, especially when dealing with high-level logical properties such
as authentication.

Verifying high-level code. The most advanced results on verifying imple-
mentations of security protocols tackle high-level languages such as F#. Two
main verification trends can be identified on high-level languages. The first
one aims at soundly extracting models from the program code, and using a
cryptography-specific tool such as ProVerif (e.g. fs2pv [BFGT06]) to verify that
the extracted protocol model is secure with respect to a given attacker model.
The second approach, on the other hand, aims at using general verification tools
such as type systems and static analysis to verify security properties directly
on the program code. Using general verification tools permits a user with less
expert knowledge to verify a program, and also allows a more modular approach
to verification, even in the context of security, as argued in [BFG10].

Verifying C code. But very few widely-used security-oriented programs are
written in such high-level languages, and lower-level languages such as C are
usually favoured. Several approaches have been proposed for analysing C secu-
rity protocol code [GP05, ULF06, CD08], but we believe them unsatisfactory
for several reasons:
• memory-safety assumptions: all three rely on assuming memory-safety

1

Page 19 of 125


properties,1
• trusted manual annotations: all three rely on a large amount of trusted
manual work,
• unsoundness: both [CD08] and [ULF06] make unsound abstractions and
simplifications, which is often not acceptable in a security-criticial context,
• scalability issues: [CD08] is limited to bounded, small in practice, numbers
of parallel sessions, and we believe [GP05] is limited to small programs due
to its whole-program analysis approach.

1.1 Goals
Our goal is to provide a new approach to soundly verify Dolev-Yao security
properties of real C code, with a minimal amount of unverified annotations and
assumptions, so that it is accessible to non-experts. We do not aim at verifying
implementations of encryption algorithms and other cryptographic operations,
but their correct usage in secure communication protocols such as TLS.

2 Framework
Previous approaches to verifying security properties of C programs did not de-
fine attacker models at the level of the programming language, since they were
based on extracting a more abstract model from the analysed C code (CSur and
Aspier), or simply verified compliance of the program to a separate specification
(as in Pistachio). However, to achieve our scalability goals, we choose to define
an attacker model on C programs, that enables a modular verification of the
code.
To avoid issues related to the complex, and often very informal semantics of the
C language, we use the F7 notion of a refined module (see [BFG10]). In F7,
a refined module consists of an imported and an exported interface, contain-
ing function declarations and predicate definitions, along with a piece of type-
checked F# code. The main result states that a refined module with empty
imported interface cannot go wrong, and careful use of assertions allows one
to statically verify correspondence properties of the code. Composition results
can also be used to combine existing refined modules whilst ensuring that their
security properties are preserved.
We define our attacker model on C programs by translating F7 interfaces into
annotated C header files. The F7 notion of an opponent, and the corresponding
security results, can then be transferred to C programs that implement an F7-
translated header. The type-checking phase in F7 is, in the case of C programs,
replaced by a verification phase, in our case using VCC. We trust that VCC is
sound, and claim that verifying that a given C program correctly implements
a given annotated C header entails that there exists an equivalent (in terms of
attacks within our attacker model) F7 implementation of that same interface.
1 Which may sometimes be purposefully broken as a source of randomness.

Page 20 of 125


3 Case Study
We show how our approach can be used in practice to verify a simple implemen-
tation of an authenticated Remote Procedure Call protocol, that authenticates
the pair of communicating parties using a pre-shared key, and links requests
and responses together. We show that different styles of C code can be verified
using this approach, with varying levels of required annotations, very few of
which are trusted by the verifier. We argue that a large part of the required
annotations are memory-safety related and would be necessary to verify other
properties of the C code, including to verify the memory-safety assumptions
made by previous approaches.

4 Conclusion
We define an attacker model for C code by interpreting verified C programs as
F7 refined modules. We then describe a method to statically prove the impos-
sibility of attacks against C code in this attacker model using VCC [CDH+ 09],
a general C verifier. This approach does not rely on unverified memory-safety
assumptions, and the amount of trusted annotations is minimal. We also believe
it is as sound and scalable as the verifier that is used. Moreover, we believe our
approach can be adapted for use with any contract-based C verifier, and could
greatly benefit from the important recent developments in that area.

References
[BFG10] Karthikeyan Bhargavan, C´dric Fournet, and Andrew D. Gordon.
e
Modular verification of security protocol code by typing. In Proceed-
ings of the 37th annual ACM SIGPLAN-SIGACT symposium on
Principles of programming languages - POPL ’10, pages 445—456,
Madrid, Spain, 2010.
[BFGT06] Karthikeyan Bhargavan, C´dric Fournet, Andrew D. Gordon, and
e
Stephen Tse. Verified interoperable implementations of security pro-
tocols. In CSFW ’06: Proceedings of the 19th IEEE workshop on
Computer Security Foundations, pages 139—-152, Washington, DC,
USA, 2006. IEEE Computer Society.
[CD08] Sagar Chaki and Anupam Datta. ASPIER: an automated framework
for verifying security protocol implementations. Technical CMU-
CyLab-08-012, CyLab, Carnegie Mellon University, 2008.

[CDH+ 09] Ernie Cohen, Markus Dahlweid, Mark Hillebrand, Dirk Leinenbach,
Michal Moskal, Thomas Santen, Wolfram Schulte, and Stephan To-
bies. VCC: a practical system for verifying concurrent C. In Pro-
ceedings of the 22nd International Conference on Theorem Prov-
ing in Higher Order Logics, pages 23—42, Munich, Germany, 2009.
Springer-Verlag.
[GP05] Jean Goubault-Larrecq and Fabrice Parrennes. Cryptographic pro-
tocol analysis on real C code. In Proceedings of the 6th International

Page 21 of 125


Conference on Veriﬁcation, Model Checking and Abstract Interpre-
tation (VMCAI’05), volume 3385 of Lecture Notes in Computer Sci-
ence, page 363–379. Springer, 2005.

[ULF06] Octavian Udrea, Cristian Lumezanu, and Jeﬀrey S Foster. Rule-
Based static analysis of network protocol implementations. IN PRO-
CEEDINGS OF THE 15TH USENIX SECURITY SYMPOSIUM,
pages 193—208, 2006.

Page 22 of 125


Agile development and usability in practice: Work cultures
of engagement

Jennifer Ferreira
j.ferreira@open.ac.uk

Supervisors Helen Sharp
Hugh Robinson
Status Fulltime
Starting date February, 2008

Abstract. Combining usability and Agile development is a complex topic. My academic research,
combined with my research into practice, suggests three perspectives from which the topic can be
usefully examined. The first two (addressing focus and coordination issues) are typically the
perspectives taken in the literature and are popular items for discussion. I propose that there is a third,
largely unexplored perspective that requires attention, that of how developers and designers engage in
the context of their work cultures.
1 Introduction
Both disciplines are still in a state of uncertainty about how one relates to the other — in terms of
whether they are addressing the same underlying issues, whether they belong to and should be
recognised as one “process”, who takes the lead and who adjusts to whom. The complexity of the
problem arises from practitioner and academic contributions to the literature, as well as the varying
perspectives the contributors hold. Complexity further arises from the practical settings in which the
problem plays out, settings characterised by different balances of power and different levels of
influence the designers and developers may have on determining how they work. What is clear, is that
the solutions proposed, follow the ways in which the problem is conceptualised. It certainly matters
how the problem is conceptualised, as this reflects which issues are important enough to address and
the ways to go about doing that. In light of this, we can unpick from the complexity three emerging
strands of discussion that deal with usability in an agile domain.
For the benefit of the following discussion, I am making the assumption that a developer
constituency exists separately from a designer constituency. Further, that if questioned, a developer
would not consider themselves doing the work of a designer and vice versa. Of course, this is not
always the case in practice. I have encountered Agile teams with no dedicated usability person assigned
to work with the team, where developers were addressing usability-related issues as part of their
everyday work. This illustrates yet another layer of complexity associated with practice that must be
acknowledged, but can not be adequately addressed within the limitations of this paper.
2 A question of focus
In the first perspective, the combination of usability approaches with Agile approaches helps
practitioners focus on important aspects of software development. While Agile approaches focus on
creating working software, usability approaches focus on creating a usable design that may or may not
be in the form of working software. A central concern of this perspective is how to support the
weaknesses of one with the strengths of the other. Agile approaches are seen to lack an awareness of
usability issues, with little guidance for how and when designers contribute to the process. Usability
approaches are seen to lack a structured approach to transforming designs into working software and,
therefore, little guidance on how developers are involved. Therefore, they are seen as complementary
approaches that, used together, improve the outcome of the software development effort. This often
serves as the motivation for combining Agile development and usability in the first place. We find
examples in the literature that combine established Agile approaches, e.g., eXtreme Programming, or

Page 23 of 125


Scrum, with established design approaches, e.g., Usage-Centered Design [6], Usability Engineering
[5]. We also find examples of well-known HCI techniques such as personas [1] and scenarios [3] being
used on Agile projects.
3 A question of coordination
The second perspective on how to bring usability and Agile development together is one where it is
considered a problem of coordination. That is, the central concern is how to allow the designers and
developers to carry out their individual tasks, and bring them together at the appropriate points.
Designers require enough time at the outset of the project to perform user research and sketch out a
coherent design. To fit with the time-boxed Agile cycles, usability techniques are often adapted to fit
within shorter timescales. Advice is generally to have designers remain ahead of the developers, so that
they have enough time to design for what is coming ahead and evaluate what has already been
implemented. In the literature we find examples of process descriptions as a way of addressing this
coordination issue. They provide a way to mesh the activities of both designers and developers, by
specifying the tasks that need to be performed in a temporal sequence (e.g., [4]).
4 Work cultures of engagement
The third perspective addresses practical settings and has received little attention so far. In this
perspective, rather than concentrating on processes or rational plans that abstract away from the
circumstances of the actions, the situatedness of the work of the developers and designers is
emphasised. This perspective encompasses both of those discussed above, while acknowledging that
issues of coordination and focus are inextricably linked with the setting in which practitioners work.
That is, how the developers and designers coordinate their work and how focus is maintained, in
practice is shaped and sustained by their work setting.
With work culture I specifically mean the “set of solutions produced by a group of people to meet
specific problems posed by the situation that they face in common” [2, p.64], in a work setting. If
developers and designers are brought together by an organisation, they will be working together amid
values and assumptions about the best way to get the work done — the manifestations of a work
culture. I combine work cultures with engagement to bring the point across that how developers and
designers engage with one another depends in essential ways on the embedded values and assumptions
regarding their work and what is considered appropriate behaviour in their circumstances.
My research into practice has provided evidence for how practical settings shape developers and
designers engaging with one another. We find that developers and designers get the job done through
their localised, contingent and purposeful actions that are not explained by the perspectives above.
Further, the developers and designers can be embedded in the same work culture, such that they share
values, assumptions and behaviours for getting the work done. But we have also encountered examples
where developers and designers are in separate groups and embedded in distinct work cultures.
Engaging in this sense requires that individuals step outside their group boundaries and figure out how
to deal with each other on a daily basis — contending with very different values, assumptions and
behaviours compared to their own.
This is an important perspective to consider because of the implications for practice that it brings —
highlighting the role of work culture, self-organisation and purposeful work. It is also a significant
perspective, since we are unlikely to encounter teams in practice who are fully self-directed and
independent of other teams, individuals or organisational influences.
5 Concluding remarks
As we work through the problems that crossing disciplinary boundaries suggest, we simultaneously
need an awareness of which conception of the problem is actually being addressed. In this paper I have
identified a third perspective requiring attention, where we take account of the work settings in which
the combination of Agile development and usability is played out. According to this perspective, it
would be unrealistic to expect that one ideal approach would emerge and successfully translate to any
other work setting. Instead, it shifts attention to the work cultures involved in usability and Agile
development in practice. It shows how understanding and supporting the mechanisms of the work
cultures that achieve engagement in that setting, contribute to understanding and supporting the
mechanisms that enable usability in an agile domain.
References
1. Haikara, J.: Usability in Agile Software Development: Extending the Interaction Design
Process with Personas Approach . In: Concas, G., Damiani, E., Scotto, M., Succi, G. (eds.)

Page 24 of 125


Agile Processes in Software Engineering and Extreme Programming. LNCS, vol. 4536/2007,
pp. 153–156. Springer, Berlin/Heidelberg (2007)
2. Vaughan, D.: The Challenger Launch Decision: Risky technology, culture and deviance at
NASA. The University of Chicago Press, Chicago and London (1996)
3. Obendorf, H., Finck, M.: Scenario-based usability engineering techniques in agile
development processes. In: CHI ’08 Extended Abstracts on Human Factors in Computing
Systems (Florence, Italy, April 05 - 10, 2008), pp. 2159–2166. ACM, New York, NY (2008)
4. Sy, D.: Adapting usability investigations for Agile user-centered design. Journal of Usability
Studies 2(3), 112–132 (2007)
5. Kane, D.: Finding a Place for Discount Usability Engineering in Agile Development:
Throwing Down the Gauntlet. In: Proceedings of the Conference on Agile Development (June
25 - 28, 2003), pp. 40. IEEE Computer Society, Los Alamitos, CA (2003)
6. Patton, J.: Hitting the target: adding interaction design to agile software development. In:
OOPSLA 2002 Practitioners Reports (Seattle, Washington, November 04 - 08, 2002), pp. 1-ff.
ACM, New York, NY (2002)

Page 25 of 125


Model Driven Architecture of Large
Distributed Hard Real Time Systems

Michael A Giddings
mag2@tutor.open.ac.uk

Supervisors Dr Pat Allen
Dr Adrian Jackson
Dr Jan Jürjens,
Dr Blaine Price
Department/Institute Department of Computing
Status Part-time
Starting date 1 October 2008

1. Background

Distributed Real-time Process Control Systems are notoriously difficult to develop.
The problems are compounded where there are multiple customers and the design
responsibility is split up between different companies based in different countries. The
customers are typically users rather than developers and the domain expertise resides
within organisations whose domain experts have little software expertise.

Two types of Distributed real-time Process Control Systems are open loop systems
and closed loop systems (with and without feedback). Typical examples are used for
the display of sensor data and control of actuators based on sensor data. Typically
systems contain a mixture of periodic and event driven processing with states
changing much more slowly than individual periodic processing steps.

In addition to the functional requirements, non functional requirements are also
needed to describe the desired operation of the software system. A number of these
requirements may be grouped together as performance requirements. Performance
requirements are varied and depend on the particular system to which they refer. In
early systems performance was managed late in the development process on a ‘fix it
later’ basis. (Smith 1990). As software systems became more sophisticated it became
necessary to manage performance issues as early as possible to avoid the cost impact
of late detected performance failures.

2. The Problem

The need for modelling performance for the early detection of performance failures is
well established. (Smith 1990). Recent surveys have shown that the adoption of the
Unified Modelling Language (UML) in software systems development remains low at
16% with no expected upturn. The use of trial and error methods in embedded system
development remains at 25%. (Sanchez and Acitores 2009).

Page 26 of 125


A number of summary papers exist that list the performance assessment methods and
tools. (Smith 2007), (Balsamo, Di Marco et al. 2004), (Koziolek 2009) and
(Woodside, Franks et al. 2007). These identify performance assessment methods
suitable for event driven systems, client/server systems, layered queuing networks and
systems with shared resources.

Fifteen performance approaches identified to combat the ‘fix-it-later’ approach have
been summarised. (Balsamo, Di Marco et al. 2004). These methods apply to a broad
range of software systems and performance requirements. In particular they cover
shared resources (Hermanns, Herzog et al. 2002), client/servers (Huhn, Markl et al.
2009) and event driven systems (Staines 2006) (Distefano, Scarpa et al. 2010) and
mainly focus on business systems. Each of these performance methods can contribute
to the performance analysis of Distributed Real-time Process Control Systems but rely
on system architecture and software design being wholly or partly complete.

3. Proposed Solution

In this paper I propose modelling individual system elements, sensors, actuators,
displays and communication systems as periodic processes associated with a
statistical description of the errors and delays. Existing performance methods based
on MARTE (OMG 2009) using the techniques described above can be used for
individual elements to calculate performance. The proposed methodology, however,
enables models to be developed early for systems which comprise individual
processing elements, sensors, actuators, displays and controls linked by a bus
structure prior to the development of UML models.

System architects establish the components and component communications early in
the system lifecycle. Tools based on SysML 1.1 (OMG 2008) provide a method of
specifying the system architecture. These design decisions frequently occur prior to
any detailed performance assessment. Early performance predictions enable
performance requirements to be established for individual system elements with a
greater confidence than the previous ‘fix-it-later’ approach. (Eeles 2009).

It has been claimed (Lu, Halang et al. 2005; Woodside, Franks et al. 2007) that Model
Driven Architecture (MDA) (OMG 2003) is able to aid in assessing performance. A
periodic processing architecture may enable early assessment of performance by
permitting loosely coupled functional elements to be used as building blocks of a
system. A high level of abstraction and automatic translation between models can be
achieved using functional elements. Platform independent models for the individual
components of the system mixed with scheduling information for each component
may enable the impact of functional changes and real performance to be assessed
early in the development process. Models for individual elements can be combined
taking into account that the iteration schedules for each element are not synchronised
with each other. These models can be animated or performance calculated with
established mathematical methods (Sinha 1994).

One way that MDA may be used to provide early performance assessment is to
develop a functional model similar to CoRE (Mullery 1979) alongside the UML
(OMG 2003) models in the MDA Platform Independent Model. The functional model

Page 27 of 125


can then be developed by domain experts without any knowledge of software
techniques.

For central system computers it can also be used to identify classes and methods in
the MDA Platform Independent Model by a simple semi-automatic process similar to
the traditional noun and verb annunciation methods. It can be used to identify simple
functional elements which can be implemented as part of a periodic iteration
architecture. Animation of these functional elements at the requirements stage may be
undertaken in a way which will reflect the actual performance of the computer.

Non periodic processing elements, bus systems, sensors, actuators, displays and
controls can be represented by abstractions based on an iteration schedule. This model
can be used to specify the requirements for individual elements

Connections between the independent functional elements which represent the
notional data flow across a periodic system can be used to establish functional chains
which can identify all the functional elements that relate to each specific end event.
Each functional chain can then be analysed into a collection of simple sub-chains. Not
all of which will have the same performance requirements when combined to meet the
overall performance requirement. When each of the sub-chains has been allocated its
own performance criteria individual functional elements can be appropriately
scheduled within a scheduling plan with each element only being scheduled to run
sufficiently frequently to meet the highest requirement of each sub-chain. This leads
to a more efficient use of processing capacity than conventional periodic systems.

This provides three opportunities to animate the overall system which should produce
similar results. The first opportunity is to schedule algorithms defined within the
definition of each functional element in the functional model associated with the
MDA Platform Independent Model. The second opportunity is to animate the object
oriented equivalent of the functional chain in the UML models in the MDA Platform
Independent Model (PIM) for the central processing elements. These would combine
sequence diagrams which represent the functional model functional elements and
objects and attributes of objects to represent the notional data flow. These would be
combined with the functional chains for the remaining system elements. The third
opportunity is to replace the functional chains generated from the Platform PIM with
implemented functional elements from the MDA Platform Specific Models PSMs.

Each animation would use standard iteration architectures to execute each functional
element in the right order at the correct moment in accordance with regular
predictable scheduling tables. The iteration parameters can be generated in a form
which can be applied to each animation opportunity and final implementation without
modification.

Functional chains can be extracted from the functional model and animated
independently enabling full end to end models to be animated using modest
computing resources.

Page 28 of 125


4. Conclusion

The proposed methodology enables performance to be animated or calculated early in
the design process generating models automatically focused on sections of the system
which relate to individual performance end events prior to architectural and software
structures being finalised.

5. References

Balsamo, S., A. Di Marco, et al. (2004). "Model-based performance prediction in
software development: a survey." Software Engineering, IEEE Transactions
on 30(5): 295-310.
Distefano, S., M. Scarpa, et al. (2010). "From UML to Petri Nets: the PCM-Based
Methodology." Software Engineering, IEEE Transactions on PP(99): 1-1.
Eeles, P. C., Peter (2009). The process of Software Architecting, Addison Wesley
Professional.
Hermanns, H., U. Herzog, et al. (2002). "Process algebra for performance evaluation."
Theoretical Computer Science 274(1-2): 43-87.
Huhn, O., C. Markl, et al. (2009). "On the predictive performance of queueing
network models for large-scale distributed transaction processing systems."
Information Technology & Management 10(2/3): 135-149.
Koziolek, H. (2009). "Performance evaluation of component-based software systems:
A survey." Performance Evaluation In Press, Corrected Proof.
Lu, S., W. A. Halang, et al. (2005). A component-based UML profile to model
embedded real-time systems designed by the MDA approach. Embedded and
Real-Time Computing Systems and Applications, 2005. Proceedings. 11th
IEEE International Conference on.
Mullery, G. P. (1979). CORE - a method for controlled requirement specification.
Proceedings of the 4th international conference on Software engineering.
Munich, Germany, IEEE Press.
OMG. (2003). "MDA Guide Version 1.0.1 OMG/2003-06-01." from
<http://www.omg.org/docs/omg/03-06-01.pdf>.
OMG. (2003). "UML 1.X and 2.x Object Management Group." from www.uml.org.
OMG (2008). OMG Systems Modelling Language (SysML) 1.1.
OMG (2009). "OMG Profile ‘UML Profile for MARTE’ 1.0."
Sanchez, J. L. F. and G. M. Acitores (2009). Modelling and evaluating real-time
software architectures. Reliable Software Technologies - Ada-Europe 2009.
14th Ada-Europe International Conference on Reliable Software Technologies,
Brest, France, Springer Verlag.
Sinha, N. K., Ed. (1994). Control Systems, New Age International.
Smith, C. (1990). Perfomance Engineering of software systems, Addison Wesley.
Smith, C. (2007). Introduction to Software Performance Engineering: Origins and
Outstanding Problems. Formal Methods for Performance Evaluation: 395-428.
Staines, T. S. (2006). Using a timed Petri net (TPN) to model a bank ATM.
Engineering of Computer Based Systems, 2006. ECBS 2006. 13th Annual
IEEE International Symposium and Workshop on.
Woodside, M., G. Franks, et al. (2007). The Future of Software Performance
Engineering. Future of Software Engineering, 2007. FOSE '07, Minneapolis,
MN

Page 29 of 125


An Investigation Into Design Diagrams and Their
Implementations
Alan Hayes
alanhayes725@btinternet.com

Supervisors Dr Pete Thomas
Dr Neil Smith
Dr Kevin Waugh
Department/Institute Computing Department
Status Part-time
Starting date 1st October 2005

The broad theme of this research is concerned with the application of information
technology tools and techniques to automatically generate formative feedback based
upon a comparison of two separate, but related, artefacts. An artefact is defined as a
mechanism through which a system is described. In the case of comparing two
artefacts, both artefacts describe the same system but do so through the adoption of
differing semantic and modelling constructs. For example, in the case of a student
coursework submission, one artefact would be that of a student-submitted design
diagram (using the syntax and semantics of UML class diagrams) and the second
artefact would be that of the student-submitted accompanying implementation (using
java syntax and semantics). Both artefacts represent the student’s solution to an
assignment brief set by the tutor. The design diagram describes the solution using one
set of semantic representations (UML class diagrams) whilst the implementation
represents the same solution using an alternative set (Java source code). Both artefacts
are describing the same system and represent a solution to the assignment brief. An
alternative example would be that of a student submitting an ERD diagram with an
accompanying SQL implementation.

This research aims to identify the generic mechanisms needed for a tool to be able to
compare two different, but related, artefacts and generate meaningful formative
feedback based upon this comparison. A case study is presented that applies these
components to the case of automatically generating formative assessment feedback to
the students based upon their submission. The specific area of formative feedback
being addresses is based upon a comparison between the submitted design and the
accompanying implementation. Constituent components described within each
artefact are considered to be consistent if, despite the differing modelling constructs,
they describe features that are common to both artefacts. The design (in diagrammatic
format) is viewed as prescribing the structure and function contained within the
implementation, whilst the implementation (source code) is viewed as implementing
the design whilst adhering to its specified structure and function. There are several
major challenges and themes that feed into this issue. The first is how the consistency
between a student-submitted design and its implementation can be measured in such a
way that meaningful formative feedback could be generated. This involves being able
to represent both components of the student submission in a form that facilitates their
comparison. Thomas et al [2005] and Smith et al [2004] describe a method of
reducing a student diagram into meaningful minimum components. Tselonis et al

Page 30 of 125


[2005] adopt a graphical representation mapping entities to nodes and relationships to
arcs. Consequently, one component of this research addresses how the student
submitted design and its source code representation can be reduced to their constituent
meaningful components.

The second challenge associated with this research addresses the problem of how to
facilitate a meaningful comparison between these representations and how the output
of a comparison can be utilised to produce meaningful feedback. This challenge is
further complicated as it is known that the student submission will contain errors.
Smith et al [2004] and Thomas et al [2005] identified that the student diagrams will
contain data that is either missing or extraneous. Thomasson et al [2006] analysed the
designs of novice undergraduate computer programmers and identified a range of
typical errors found in the student design diagrams. Additionally, Bollojou et al
[2006] analysed UML modelling errors made by novice analysts and have identified a
range of typical semantic errors made. Some of these errors will propagate into the
student implementation whilst some will not.

This research investigates how such analysis and classifications can be used to
support the development of a framework that facilitates the automation of the
assessment process. This work will be complemented by an analysis of six data sets
collated for this research. Each data set is comprised of a set of student diagrams and
their accompanying implementations. It is anticipated that this work will be of interest
to academic staff engaged in the teaching, and consequently assessment, of
undergraduate computing programmes. It will also be of interest to academic staff
considering issues surrounding the prevention of plagiarism. Additionally, it will be
of interest to those engaged in the field of software engineering and in particular to
those involved in the auditing of documentation and practice.

References

[1] Higgins C., Colin A., Gray G., Symeonidis P. and Tsintsifas A. 2005 Automated
Assessment and Experiences of Teaching Programming. In Journal on
Educational Resources in Computing (JERIC) Volume 5 Issue 3, September 2005.
ACM Press

[2] Thomasson B., Ratcliffe M. and Thomas L., 2005 Identifying Novice Difficulties
in Object Oriented Design. In Proceedings of Information Technology in
Computer Science Education (ITiCSE ’06), June 2006, Bologna, Italy.

[3] Bolloju N. and Leung F. 2006 Assisting Novice Analysts in Developing Quality
Conceptual Models with UML. In Communications of the ACM June 2006, Vol
49, No. 7, pp 108-112

[4] Tselonis C., Sargeant J. and Wood M. 2005 Diagram Matching for Human-
Computer Collaborative Assessment. In Proceedings of the 9th International
conference on Computer Assisted Assessment, 2005.

Page 31 of 125


[5] Smith N., Thomas, P. and Waugh K. (2004) Interpreting Imprecise Diagrams. In
Proceedings of the Third International Conference in Theory and Applications of
Diagrams. March 22-24, Cambridge, UK. Springer Lecture Notes in Computer
Science, eds: Alan Blackwell, Kim Marriott, Atsushi Shimomnja, 2980, 239-241.
ISBN 3-540-21268-X.

[6] Thomas P., Waugh K. and Smith N., (2005) Experiments in the Automated
Marking of ER-Diagrams. In Proceedings of 10th Annual Conference on
Innovation and Technology in Computer Science Education (ITiCSE 2005)
(Lisbon, Portugal, June 27-29, 2005).

Page 32 of 125


An Investigation into Interoperability of Data Between
Software Packages used to Support the Design, Analysis and
Visualisation of Low Carbon Buildings

Robina Hetherington
R.E.Hetherington@open.ac.uk

Supervisors Robin Laney
Stephen Peake
Status Fulltime
Starting date January 2010
This paper outlines a preliminary study into the interoperability of building design and
energy analysis software packages. It will form part of a larger study into how
software can support the design of interesting and adventurous low carbon buildings.
The work is interdisciplinary and is concerned with design, climate change and
software engineering.
Research Methodology
The study will involve a blend of research methods. Firstly the key literature
surrounding the study will be critically reviewed. A case study will look at the
modelling of built form, with reflection upon the software and processes used. The
model used in the case study will then be used to enable the analysis of data
movement between software packages. Finally conclusions regarding the structures,
hierarchies and relationships between interoperable languages used in the process will
be drawn. This will inform the larger study into how software can support the design
of interesting and adventurous low carbon buildings.
Research questions:
1. What are the types of software used to generate building models and conduct
the analysis of energy performance?
2. What is the process involved in the movement of data from design software to
energy analysis software to enable the prediction of the energy demands of
new buildings?
3. What are the potential limitations of current interoperable languages used to
exchange data and visualise the built form?
Context
Software has an important role in tackling climate change, it is “a critical enabling
technology” [1]. Software tools can be used to support decision making surrounding
climate change in three ways; prediction of the medium to long term effects,
formation and analysis of adaptation strategies and support of mitigation methods.
This work falls into the later category, to reduce the sources of greenhouse gases
through energy efficiency and the use of renewable energy sources [2].
Climate change is believed to be caused by increased anthropogenic emissions of
green house gases. One of the major greenhouse gases is carbon dioxide. In the UK

Page 33 of 125


the Climate Change Act of 2008 has set legally binding targets to reduce the emission
of carbon dioxide by 80% from 1990 levels by 2050 [3]. As buildings account for
almost 50% of UK carbon dioxide emissions the necessary alteration of practices
related to the construction and use of buildings will have a significant role in
achieving these targets [4]. In 2007 the UK Government announced the intention that
all new houses would be carbon neutral by 2016 in the “Building a Greener Future:
policy statement”. This is to be achieved by progressive tightening of Building
Regulations legislation over a number of years [4]. Consultations are currently taking
place on the practicalities of legislating for public sector buildings and all new non-
domestic buildings to be carbon neutral by 2018 and 2019 respectively [5]. The
changes in praxis in the next 20-30 years facing the construction industry caused by
this legislation are profound [6].
Software used in building modelling
Architecture has gone through significant changes since the 1980s when CAD
[Computer Aided Draughting/Design] was introduced. The use of software has
significantly altered working practices and enabled imaginative and inspiring designs,
sometimes using complex geometries only achievable through the use of advanced
modelling and engineering computational techniques. However, the advances in
digital design media have created a complex web of multiple types of software,
interfaces, scripting languages and complex data models [7].
The types of software used by architects can be grouped into three main categories:
CAD software that can be used to generate 2D or 3D visualizations of buildings. This
type of software evolved from engineering and draughting practices, using command
line techniques to input geometries. This software is mainly aimed at imitating paper
based practices, with designs printed to either paper or pdf.
Visualization software, generally used in the early design stages for generating high
quality renderings of the project.
BIM [Building Information Modelling] software has been a significant development
in the last few years. BIM software contains the building geometry and spatial
relationship of building elements in 3D. It can also hold geographic information,
quantities and properties of building components, with each component as an ‘object’
recorded in a backend database. Building models of this type are key to the
calculations now required to support zero carbon designs [8]. Examples of BIM
software are Revit by Autodesk[9], and ArchiCAD by Graphisoft[10] and Bentley
Systems [11]
Energy analysis software
Analysis software is used to perform calculations such as heat loss, solar gains,
lighting, acoustics, etc. This type of analysis is usually carried out by a specialist
engineer, often subsequent to the architectural design. The available tools are thus
aimed at the expert engineer who have explicit knowledge to run and interpret the
results of the simulation. This means that, until recent legislative changes, there was
no need for holistic performance assessment to be integrated into design software
[12].
Calculation of energy consumption requires a model of the proposed building to make
the detailed estimates possible. Examples of expert tools that use models for the
calculation are TRNSYS [13], IES Virtual Environment [14], EnergyPlus [15]. One
tool that supports the architectural design process is Ecotect [16], which has a more
intuitive graphical interface and support to conduct a performance analysis [12].

Page 34 of 125

CRC Conference proceedings

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to CRC Conference proceedings

Similar to CRC Conference proceedings (20)

More from anesah

More from anesah (20)

Recently uploaded

Recently uploaded (20)

CRC Conference proceedings