CRC Conference proceedings


Published on

Proceedings from the Centre for Research in Computing PhD Student Conference

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

CRC Conference proceedings

  1. 1. Proceedings of the 2010 CRC PhD Student Conference Centre for Research in Computing The Open University Milton Keynes June 3 and 4, 2010
  2. 2. Centre for Research in Computing The Open University Milton Keynes, UK Conference organization: Marian Petre Robin Laney Mathieu D’Aquin Paul Piwek Debbie Briggs May 2010 Proceedings compiled by Paul Piwek
  3. 3. Table of Contents Mihhail Aizatulin Verifying Implementations of Security ......... 1 Protocols in C Simon Butler Analysing Semantic Networks of ......... 5 Identifier Names to Improve Source Code Maintainability and Quality Tom Collins Discovering Translational Patterns in ......... 9 Symbolic Representation of Music Joe Corneli Semantic Adaptivity and Social ......... 12 Networking in Personal Learning Networks Richard Doust Investigating narrative “effects”: the ......... 15 case of suspense Francois Verifying Authentication Properties of ......... 19 Dupressoir C Security Protocol Code Using General Verifiers Jennifer Ferreira Agile development and usability in ......... 23 practice: Work cultures of engagement Michael A Model Driven Architecture of Large ......... 26 Giddings Distributed Hard Real Time Systems Alan Hayes An Investigation into Design ......... 30 Diagrams and Their Implementations Robina An Investigation into Interoperability ......... 33 Hetherington of Data Between Software Packages used to support the Design, Analysis and Visualisation of Low Carbon Buildings Chris Ireland Understanding Object-Relational ......... 37 Impedance Mismatch: A Framework Based Approach
  4. 4. Lukasz “Privacy Shake”, a Haptic Interface ......... 41 Jedrzejczyk for Managing Privacy Settings in Mobile Location Sharing Applications Stefan Designing a Climate Change Game for ......... 45 Kreitmayer Interactive Tabletops Tamara Lopez Reasoning about Flaws in Software ......... 47 Design: Diagnosis and Recovery Lin Ma Presupposition Analysis in ......... 51 Requirements Lionel Montrieux Merging Verifiable and Evolving ......... 55 Access Control Properties Sharon Moyo Effective Tutoring with Affective ......... 58 Embodied Conversational Agents Brendan Murphy Evaluating a mobile learning ......... 60 environment in a home car domain Tu Anh Nguyen Generating Accessible Natural ......... 65 Language Explanations for OWL Ontologies Chwhynny Supporting the Exploration of ......... 69 Overbeeke Research Spaces Nadia Pantidi Understanding technology-rich ......... 74 learning spaces Aleksandra How best to support scientific end- ......... 78 Pawlik user software development? Brian Pluss Non-Cooperation in Computational ......... 82 Models of Dialogue Ivana Quinto A Debate Dashboard to Support the ......... 86 Adoption of On-line Argument Mapping Tools Adam Rae Supporting multimodal media ......... 91 recommendation and annotation using social network analysis Rien Sach The effect of Feedback ......... 95
  5. 5. Stefan Using Business Process Security ......... 98 Taubenberger Requirements for IT Security Risk Assessment Keerthi Thomas Distilling Privacy Requirements for ......... 102 Mobile Applications Min Q. Tran Understanding the Influence of 3D ......... 104 Virtual Worlds on Perceptions of 2D E- commerce Websites Thomas Daniel Supporting Reflection about Web ......... 108 Ullmann Resources within Mash-Up Learning Environments Rean van der Local civic governance using online ......... 110 Merwe media – a case of consensual problem solving or a recalcitrant pluralism Katie Wilkie Analysis of conceptual metaphors to ......... 114 inform music interaction designs Anna Xambo Issues and techniques for collaborative ......... 118 music making on multi-touch surfaces Saad Bin Saleem A Release Planning Model to Handle ......... 122 Security Requirements
  6. 6. 2010 CRC PhD Student Conference Verifying Implementations of Security Protocols in C Mihhail Aizatulin Supervisors Dr Andrew Gordon,, Dr Jan J¨rjens,, u Prof Bashar Nuseibeh, Department Computing Status Full-time Probation viva Passed Starting date November 2008 Our goal is verification of cryptographic protocol implementations (such as OpenSSL or Kerberos), motivated by the desire to minimise the gap between verified and executable code. Very little has been done in this area. There are numerous tools to find low-level bugs in code (such as buffer overflows and zero division) and there are verifiers for cryptographic protocols that work on fairly abstract descriptions, but so far very few attempts have been done to verify cryptographic security directly on the code, especially for low-level languages like C. We attempt to verify the protocol code by extracting an abstract model that can be used in high-level cryptographic verification tools such as ProVerif or CryptoVerif. This is the first such approach that we are aware of. Currently we investigate the feasibility of the approach by extracting the model from running code, using the so called concolic (concrete + symbolic) execution. We run the protocol implementation normally, but at the same time we record all the operations performed on binary values and then replay those operations on symbolic values. The resulting symbolic expressions reveal the structure of the messages sent to the network and the conditions that are checked for incoming messages. We are able to produce symbolic execution traces for the handshake imple- mented in the OpenSSL library. To give an example of what the extracted traces look like, consider a simple request-response protocol, protected by hashing with a shared key: A → B : m|hash(‘request’|m, kAB ), B → A : m |hash(‘response’|m|m , kAB ). We implemented the protocol in about 600 lines of C code, calling to the OpenSSL cryptographic library. Our concolic execution tool produces a trace of 8 lines Page 1 of 125
  7. 7. 2010 CRC PhD Student Conference write(i39) payload1 = payload() key2 = key() write(i14|7c|payload1|HMAC(sha1, i7|7c52657175657374|payload1, key2)) msg3 = read() var4 = msg3{5,23} branchF((memcmp(msg3{28,20}, HMAC(sha1, i8|7c526573706f6e7365|i14|7c|payload1|var4, key2)) != i0)) accept(var4) Figure 1: An excerpt from the symbolic client trace. X{start, len} denotes the substring of X starting at start of length len. iN is an integer with value N (width information is omitted), and branchT and branchF are the true or false branches taken by the code. for the client side shown in figure 1: we see the client sending the request and checking the condition on the server response before accepting it. We are currently working to implement symbolic handling of buffer lengths and sound handling of loops as well as making the extracted models compatible with those understood by ProVerif and CryptoVerif, in particular simplifying away any remaining arithmetic expressions from the symbolic trace. One obvious drawback of concolic execution is that it only follows the single path that was actually taken by the code. This is enough to produce an accurate model when there is only one main path, however, libraries like OpenSSL contain multiple nontrivial paths. Thus, to achieve verification of those libraries, we plan to move the analysis towards being fully static in future. Related Work One of the earliest security verification attempts directly on code is probably CSur [Goubault-Larrecq and Parrennes, 2005] that deals directly with C protocol implementations. It translates programs into a set of Horn clauses that are fed directly into a general purpose theorem prover. Unfortunately, it never went beyond some very simple implementations and has not been developed since. The work [J¨rjens, 2006] describes an approach of translating Java programs u in a manner similar to above. In our work we try to separate reasoning about pointers and integers from reasoning about cryptography, in hope to achieve greater scalability. Some work has been done on verification of functional language implementa- tions, either by translating the programs directly into π-calculus [Bhargavan et al., 2006; Bhargavan et al., 2008] or by designing a type system that enforces security [Bengtson et al., 2008]. Unfortunately, it is not trivial to adapt such approaches to C-like languages. ASPIER [Chaki and Datta, 2008] is using model checking for verification and has been applied to OpenSSL. However, it does not truly start from C code: any code explicitly dealing with pointers needs to be replaced by abstract summaries Page 2 of 125
  8. 8. 2010 CRC PhD Student Conference that presumably have to be written manually. Concolic execution is widely used to drive automatic test generation, like in [Cadar et al., 2008] or [Godefroid et al., 2008]. One difference in our concolic execution is that we need to assign symbols to whole bitstrings, whereas the testing frameworks usually assign symbols to single bytes. We believe that our work could be adapted for testing of cryptographic software. Usual testing approaches try to create an input that satisfies a set of equations resulting from checks in code. In presence of cryptography such equations will (hopefully) be impossible to solve, so a more abstract model like ours might be useful. A separate line of work deals with reconstruction of protocol message formats from implementation binaries [Caballero et al., 2007; Lin et al., 2008; Wondracek et al., 2008; Cui et al., 2008; Wang et al., 2009]. The goal is typically to reconstruct field boundaries of a single message by observing how the binary processes the message. Our premises and goals are different: we have the advantage of starting from the source code, but in exchange we aim to reconstruct the whole protocol flow instead of just a single message. Our reconstruction needs to be sound to enable verification — all possible protocol flows should be accounted for. References [Bengtson et al., 2008] Jesper Bengtson, Karthikeyan Bhargavan, C´dric Four- e net, Andrew D. Gordon, and Sergio Maffeis. Refinement types for secure implementations. In CSF ’08: Proceedings of the 2008 21st IEEE Computer Security Foundations Symposium, pages 17–32, Washington, DC, USA, 2008. IEEE Computer Society. [Bhargavan et al., 2006] Karthikeyan Bhargavan, C´dric Fournet, Andrew D. e Gordon, and Stephen Tse. Verified interoperable implementations of security protocols. In CSFW ’06: Proceedings of the 19th IEEE workshop on Computer Security Foundations, pages 139–152, Washington, DC, USA, 2006. IEEE Computer Society. [Bhargavan et al., 2008] Karthikeyan Bhargavan, C´dric Fournet, Ricardo Corin, e and Eugen Zalinescu. Cryptographically verified implementations for TLS. In CCS ’08: Proceedings of the 15th ACM conference on Computer and communications security, pages 459–468, New York, NY, USA, 2008. ACM. [Caballero et al., 2007] Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. Polyglot: automatic extraction of protocol message format using dynamic binary analysis. In CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security, pages 317–329, New York, NY, USA, 2007. ACM. [Cadar et al., 2008] Cristian Cadar, Daniel Dunbar, and Dawson Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex sys- Page 3 of 125
  9. 9. 2010 CRC PhD Student Conference tems programs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI 2008), San Diego, CA, december 2008. [Chaki and Datta, 2008] Sagar Chaki and Anupam Datta. Aspier: An auto- mated framework for verifying security protocol implementations. Technical Report 08-012, Carnegie Mellon University, October 2008. [Cui et al., 2008] Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, and Luis Irun-Briz. Tupni: automatic reverse engineering of input formats. In CCS ’08: Proceedings of the 15th ACM conference on Computer and communications security, pages 391–402, New York, NY, USA, 2008. ACM. [DBL, 2008] Proceedings of the Network and Distributed System Security Sympo- sium, NDSS 2008, San Diego, California, USA, 10th February - 13th February 2008. The Internet Society, 2008. [Godefroid et al., 2008] Patrice Godefroid, Michael Y. Levin, and David A. Mol- nar. Automated whitebox fuzz testing. In NDSS [2008]. [Goubault-Larrecq and Parrennes, 2005] J. Goubault-Larrecq and F. Parrennes. Cryptographic protocol analysis on real C code. In Proceedings of the 6th International Conference on Verification, Model Checking and Abstract Inter- pretation (VMCAI’05), volume 3385 of Lecture Notes in Computer Science, pages 363–379. Springer, 2005. [J¨rjens, 2006] Jan J¨ rjens. Security analysis of crypto-based Java programs u u using automated theorem provers. In ASE ’06: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, pages 167–176, Washington, DC, USA, 2006. IEEE Computer Society. [Lin et al., 2008] Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. Automatic protocol format reverse engineering through context-aware moni- tored execution. In NDSS [2008]. [Wang et al., 2009] Zhi Wang, Xuxian Jiang, Weidong Cui, Xinyuan Wang, and Mike Grace. Reformat: Automatic reverse engineering of encrypted messages. In Michael Backes and Peng Ning, editors, ESORICS, volume 5789 of Lecture Notes in Computer Science, pages 200–215. Springer, 2009. [Wondracek et al., 2008] Gilbert Wondracek, Paolo Milani Comparetti, Christo- pher Kruegel, and Engin Kirda. Automatic Network Protocol Analysis. In 15th Symposium on Network and Distributed System Security (NDSS), 2008. Page 4 of 125
  10. 10. 2010 CRC PhD Student Conference Analysing semantic networks of identifier names to improve source code maintainability and quality Simon Butler Supervisors Michel Wermelinger, Yijun Yu & Helen Sharp Department/Institute Centre for Research in Computing Status Part-time Probation viva After Starting date October 2008 Source code is the written expression of a software design consisting of identifier names – natural language phrases that represent concepts being manipulated by the program – embedded in a framework of keywords and operators provided by the programming language. Identifiers are crucial for program comprehen- sion [9], a necessary activity in the development and maintenance of software. Despite their importance, there is little understanding of the relationship be- tween identifier names and source code quality and maintainability. Neither is there automated support for identifier management or the selection of relevant natural language content for identifiers during software development. We will extend current understanding of the relationship between identifier name quality and source code quality and maintainability by developing tech- niques to analyse identifiers for meaning, modelling the semantic relationships between identifiers and empirically validating the models against measures of maintainability and software quality. We will also apply the analysis and mod- elling techniques in a tool to support the selection and management of identifier names during software development, and concept identification and location for program comprehension. The consistent use of clear identifier names is known to aid program com- prehension [4, 7, 8]. However, despite the advice given in programming conven- tions and the popular programming literature on the use of meaningful identifier names in source code, the reality is that identifier names are not always meaning- ful, may be selected in an ad hoc manner, and do not always follow conventions [5, 1, 2]. Researchers in the reverse engineering community have constructed mod- els to support program comprehension. The models range in complexity from textual search systems [11], to RDF-OWL ontologies created either solely from source code and identifier names [8], or with the inclusion of supporting doc- umentation and source code comments [13]. The ontologies typically focus on Page 5 of 125
  11. 11. 2010 CRC PhD Student Conference class and method names, and are used for concept identification and location based on the lexical similarity of identifier names. The approach, however, does not directly address the quality of identifier names used. The development of detailed identifier name analysis has focused on method names because their visibility and reuse in APIs implies a greater need for them to contain clear information about their purpose [10]. Caprile and Tonella [3] derived both a grammar and vocabulary for C function identifiers, sufficient for the implementation of automated name refactoring. Høst and Østvold [5] have since analysed Java method names looking for a common vocabulary that could form the basis of a naming scheme for Java methods. Their analysis of the method names used in multiple Java projects found common grammatical forms; however, there were sufficient degenerate forms for them to be unable to derive a grammar for Java method names. The consequences of identifier naming problems have been considered to be largely confined to the domain of program comprehension. However, Deißenb¨ck o and Pizka observed an improvement in maintainability when their rules of con- cise and consistent naming were applied to a project [4], and our recent work found statistical associations between identifier name quality and source code quality [1, 2]. Our studies, however, only looked at the construction of the identifier names in isolation, and not at the relationships between the meaning of the natural language content of the identifiers. We hypothesise that a rela- tionship exists between the quality of identifier names, in terms of their natural language content and semantic relationships, and the quality of source code, which can be understood in terms of the functionality, reliability, and usability of the resulting software, and its maintainability [6]. Accordingly, we seek to answer the following research question: How are the semantic relationships between identifier names, in- ferred from their natural language content and programming lan- guage structure, related to source code maintainability and quality? We will construct models of source code as semantic networks predicated on both the semantic content of identifier names and the relationships between identifier names inferred from the programming language structure. For exam- ple, the simple class Car in Figure 1 may be represented by the semantic network in Figure 2. Such models can be applied to support empirical investigations of the relationship between identifier name quality and source code quality and maintainability. The models may also be used in tools to support the manage- ment and selection of identifier names during software development, and to aid concept identification and location during source code maintenance. public c l a s s Car extends V e h i c l e { Engine e n g i n e ; } Figure 1: The class Car We will analyse identifier names mined from open source Java projects to create a catalogue of identifier structures to understand the mechanisms em- ployed by developers to encode domain information in identifiers. We will build Page 6 of 125
  12. 12. 2010 CRC PhD Student Conference on the existing analyses of C function and Java method identifier names [3, 5, 8], and anticipate the need to develop additional techniques to analyse identifiers, particularly variable identifier names. extends Car Vehicle has a has instance named Engine engine Figure 2: A semantic network of the class Car Modelling of both the structural and semantic relationships between iden- tifiers can be accomplished using Gellish [12], an extensible controlled natural language with dictionaries for natural languages – Gellish English being the variant for the English language. Unlike a conventional dictionary, a Gellish dictionary includes human- and machine-readable links between entries to de- fine relationships between concepts – thus making Gellish a semantic network – and to show hierarchical linguistic relationships such as meronymy, an entity– component relationship. Gellish dictionaries also permit the creation of multiple conceptual links for individual entries to define polysemic senses. The natural language relationships catalogued in Gellish can be applied to establish whether the structural relationship between two identifiers implied by the programming language is consistent with the conventional meaning of the natural language found in the identifier names. For example, a field is implic- itly a component of the containing class allowing the inference of a conceptual and linguistic relationship between class and field identifier names. Any incon- sistency between the two relationships could indicate potential problems with either the design or with the natural language content of the identifier names. We have assumed a model of source code development and comprehension predicated on the idea that it is advantageous for coherent and relevant semantic relationships to exist between identifier names based on their natural language content. To assess the relevance of our model to real-world source code we will validate the underlying assumption empirically. We intend to mine both software repositories and defect reporting systems to identify source code impli- cated in defect reports and evaluate the source code in terms of the coherence and consistency of models of its identifiers. To assess maintainability we will investigate how source code implicated in defect reports develops in successive versions – e.g. is the code a continuing source of defects? – and monitor areas of source code modified between versions to determine how well our model predicts defect-prone and defect-free regions of source code. We will apply the results of our research to develop a tool to support the selection and management of identifier names during software development, as well as modelling source code to support software maintenance. We will evaluate and validate the tool with software developers – both industry partners and FLOSS developers – to establish the value of identifier naming support. While intended for software developers, the visualisations of source code presented by Page 7 of 125
  13. 13. 2010 CRC PhD Student Conference the tool will enable stakeholders (e.g. domain experts) who are not literate in programming or modelling languages (like Java and UML) to examine, and feedback on, the representation of domain concepts in source code. References [1] S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Relating identifier naming flaws and code quality: an empirical study. In Proc. of the Working Conf. on Reverse Engineering, pages 31–35. IEEE Computer Society, 2009. [2] S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Exploring the influence of identifier names on code quality: an empirical study. In Proc. of the 14th European Conf. on Software Maintenance and Reengineering, pages 159–168. IEEE Computer Society, 2010. [3] B. Caprile and P. Tonella. Restructuring program identifier names. In Proc. Int’l Conf. on Software Maintenance, pages 97–107. IEEE, 2000. [4] F. Deißenb¨ck and M. Pizka. Concise and consistent naming. Software o Quality Journal, 14(3):261–282, Sep 2006. [5] E. W. Høst and B. M. Østvold. The Java programmer’s phrase book. In Software Language Engineering, volume 5452 of LNCS, pages 322–341. Springer, 2008. [6] International Standards Organisation. ISO/IEC 9126-1: Software engineer- ing – product quality, 2001. [7] D. Lawrie, H. Feild, and D. Binkley. An empirical study of rules for well- formed identifiers. Journal of Software Maintenance and Evolution: Re- search and Practice, 19(4):205–229, 2007. [8] D. Ratiu. Intentional Meaning of Programs. PhD thesis, Technische Uni- ¸ versit¨t M¨nchen, 2009. a u [9] V. Rajlich and N. Wilde. The role of concepts in program comprehension. In Proc. 10th Int’l Workshop on Program Comprehension, pages 271–278. IEEE, 2002. [10] M. Robillard. What makes APIs hard to learn? Answers from developers. IEEE Software, 26(6):27–34, Nov.-Dec. 2009. [11] G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. Identifying word relations in software: a comparative study of semantic similarity tools. In Proc Int’l Conf. on Program Comprehension, pages 123–132. IEEE, June 2008. [12] A. S. H. P. van Renssen. Gellish: a generic extensible ontological language. Delft University Press, 2005. [13] R. Witte, Y. Zhang, and J. Rilling. Empowering software maintainers with semantic web technologies. In European Semantic Web Conf., pages 37–52, 2007. Page 8 of 125
  14. 14. 2010 CRC PhD Student Conference Discovering translational patterns in symbolic representations of music Tom Collins Supervisors Robin Laney Alistair Willis Paul Garthwaite Department/Institute Centre for Research in Computing Status Fulltime Probation viva After Starting date October 2008 RESEARCH QUESTION How can current methods for pattern discovery in music be improved and integrated into an automated composition system? The presentation will address the first half of this research question: how can current methods for pattern discovery in music be improved? INTRA-OPUS PATTERN DISCOVERY Suppose that you wish to get to know a particular piece of music, and that you have a copy of the score of the piece or a MIDI file. (Scores and MIDI files are symbolic representations of music and are the focus of my presentation, as opposed to sound recordings.) Typically, to become familiar with a piece, one listens to the MIDI file or studies/plays through the score, gaining an appreciation of where and how material is repeated, and perhaps also gaining an appreciation of the underlying structure. The literature contains several algorithmic approaches to this task, referred to as ‘intra-opus’ pattern discovery [2, 4, 5]. Given a piece of music in a symbolic representation, the aim is to define and evaluate an algorithm that discovers and returns patterns occurring within the piece. Some potential applications for such an algorithm are as follows: • A pattern discovery tool to aid music students. • Comparing an algorithm’s discoveries with those of a music expert as a means of investigating human perception of music. • Stylistic composition (the process of writing in the style of another composer or period) assisted by using the patterns/structure returned by a pattern discovery algorithm [1, 3]. Page 9 of 125
  15. 15. 2010 CRC PhD Student Conference TWO IMPROVEMENTS Current methods for pattern discovery in music can be improved in two ways: 1. The way in which the algorithm’s discoveries are displayed for a user can be improved. 2. A new algorithm can be said to improve upon existing algorithms if, according to standard metrics, it is the strongest-performing algorithm on a certain task. Addressing the first area for improvement, suppose that an algorithm has discovered hundreds of patterns within a piece of music. Now these must be presented to the user, but in what order? Various formulae have been proposed for rating a discovered pattern, based on variables that quantify attributes of that pattern and the piece of music in which it appears [2, 4]. To my knowledge, none have been derived or validated empirically. So I conducted a study in which music undergraduates examined excerpts taken from Chopin’s mazurkas and were instructed to rate already- discovered patterns, giving high ratings to patterns that they thought were noticeable and/or important. A model useful for relating participants’ ratings to the attributes was determined using variable selection and cross-validation. This model leads to a new formula for rating discovered patterns, and the basis for this formula constitutes a methodological improvement. Addressing the second area for improvement, I asked a music analyst to analyse two sonatas by Domenico Scarlatti and two preludes by Johann Sebastian Bach. The brief was similar to the intra-opus discovery task described above: given a piece of music in staff notation, discover translational patterns that occur within the piece. Thus, a benchmark of translational patterns was formed for each piece, the criteria for benchmark membership being left largely to the analyst’s discretion. Three algorithms—SIA [5], COSIATEC [4] and my own, SIACT—were run on the same pieces and their performance was evaluated in terms of recall and precision. If an algorithm discovers x of the y patterns discovered by the analyst then its recall is x/y. If the algorithm also returns z patterns that are not in the analyst’s benchmark then the algorithm’s precision is x/(x + z). It was found that my algorithm, SIACT, out- performs the existing algorithms with regard to recall and, more often than not, precision. My presentation will give the definition of a translational pattern, discuss the improvements outlined above, and demonstrate how these improvements are being brought together in a user interface. SELECTED REFERENCES 1. Collins, T., R. Laney, A. Willis, and P.H. Garthwaite, ‘Using discovered, polyphonic patterns to filter computer-generated music’, in Proceedings of the International Conference on Computational Creativity, Lisbon (2010), 1-10. 2. Conklin, D., and M. Bergeron, ‘Feature set patterns in music’, in Computer Music Journal 32(1) (2008), 60-70. Page 10 of 125
  16. 16. 2010 CRC PhD Student Conference 3. Cope, D., Computational models of musical creativity (Cambridge Massachusetts: MIT Press, 2005). 4. Meredith, D., K. Lemström, and G.A. Wiggins, ‘Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music’, in Cambridge Music Processing Colloquium, Cambridge (2003), 11 pages. 5. Meredith, D., K. Lemström, and G.A. Wiggins, ‘Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music’, in Journal of New Music Research 31(4) (2002), 321-345. Page 11 of 125
  17. 17. 2010 CRC PhD Student Conference Semantic Adaptivity and Social Networking in Personal Learning Environments Joe Corneli Supervisors Alexander Mikroyannidis Peter Scott Department/Institute Knowledge Media Institute Status Fulltime Probation viva Before Starting date 01/01/10 Introductory Remarks I've decided to deal with "personal learning environments" with an eye towards the context of their creation and use. This entails looking not just at ways to help support learning experiences, but also at the complex of experiences and behaviours of the many stakeholders who are concerned with learning. (E.g. educators, content providers, software developers, institutional and governmental organizations.) This broad view is compatible with the idea of a personal learning environment put forward by the progenitors of the PLE model: "Rather than integrate tools within a single context, the system should focus instead on coordinating connections between the user and a wide range of services offered by organizations and other individuals." (Wilson et al., 2006) This problem area, which otherwise threatens to become hugely expansive, invites the creation of a unified methodology and mode of analysis. A key aim of my work is to develop such a method -- a sort of dynamic cartography. In this frame, the social roles of stakeholders are to be understood through their constituent actions. My analysis will then focus on the following question: How can mapping activity patterns in a social context help us support the learning process more effectively? Thematic Issues In order to understand patterns of interaction with data well enough to make useful maps, we must delve a bit into human sense-making behaviour. A small vocabulary of actions related to sense-making provides a model we can then use quite extensively. People look for simplifying patterns. In a countervailing trend, they look for ways to become more usefully interconnected and interoperable. To negotiate between these two types of behaviour, they identify or create "points of coordination" which provide mechanisms of control. They may do experiments, and then document how Page 12 of 125
  18. 18. 2010 CRC PhD Student Conference these mechanisms generate effects in a more or less predictable way. Finally, they developing explicit, shareable, practices which achieve "desirable" effects. Simplification, interconnection, control, experiment, motivation, and praxis -- these are the thematic issues that inform my technical investigations. Proposed Implementation Work I plan to focus on implementation is that it is an ideal place in which to refine and test my ideas about dynamic maps. My efforts will be directed largely into implementation in the following applications. * Etherpad and other related tools for live online interactions -- Data about social interactions is all interesting and potentially useful, but data about "live" social interactions is becoming increasingly available in forms that are suitable for large-scale computational analysis, and real-time use. * RDF and related techniques for data management -- Marking up complex and changing relationships between objects is standard in e.g. computer animation and computer games; it is interesting to think about how these ideas can work in other domains (e.g. to assist with learning). * Wordnet and Latent Semantic Analysis style approaches for clustering and annotating data -- There are various techniques for dividing content into thematic clusters (useful for supporting simplification behaviours needed for sense making), and for annotating data with new relationships (useful for supporting interconnection behaviours). I will explore these in various applications, e.g. applying them to the streams of data identified above. * Semantic Web style patterns for interoperability -- Content may still be king, but interfaces make up the board on which the game is played. I plan to use an existing standard for mathematical documents (OMDoc) and other API-building tools to help make the collection of mathematical resources interoperable with e.g. OU's SocialLearn platform, contributing to the development of a public service to STEM learners and practitioners worldwide. * Documentation of technical processes -- is an example of a tool that has more content contributors than coders, and more feature requests than anyone knows what to do with. Good documentation is part of making hacking easier. Towards this end, I'm planning to build to document the software used on PlanetMath (and many other projects). Conclusion Page 13 of 125
  19. 19. 2010 CRC PhD Student Conference By the end of my Ph. D. project, I hope to have built a "PLE IDE" -- a tool offering personalized support for both learners and developers. I hope to have a robust theory and practice of dynamical mapping that I will have tested out in several domains related to online learning. Reference Wilson, S., Liber, O., Johnson, M., Beauvoir, P., Sharples, P., & Milligan, C. (2006). Personal Learning Environments: Challenging The Dominant Design Of Educational Systems. Proceedings of 2nd International Workshop on Learner-Oriented Knowledge Management and KM-Oriented Learning, In Conjunction With ECTEL 06. (pp. 67-76), Crete, Greece. Page 14 of 125
  20. 20. 2010 CRC PhD Student Conference Investigating narrative ‘effects’: the case of suspense Richard Doust, Supervisors Richard Power, Paul Piwek Department/Institute Computing Status Part-time Probation viva Before Starting date October 2008 1 Introduction Just how do narrative structures such as a Hitchcock film generate the well-known feeling known as suspense ? Our goal is to investigate the structures of narratives that produce various narrative effects such as suspense, curiosity, surprise. The fundamental question guiding this research could be phrased thus: What are the minimal requirements on formal descriptions of narratives such that we can capture these phenomena and generate new narratives which contain them ? Clearly, the above phenomena may depend also on extra-narrative features such as music, filming angles, and so on. These will not be our primary concern here. Our approach consists of two main parts: 1. We present a simple method for defining a Storybase which for our purposes will serve to produce different ‘tellings’ of the same story on which we can test our suspense modelling. 2. We present a formal approach to generating the understanding of the story as it is told, and then use the output of this approach to suggest an algorithm for measuring the suspense level of a given telling of a story. We can thus compare different tellings of a story and suggest which ones will have high suspense, and which ones low. 2 Suspense 2.1 Existing definitions Dictionary definitions of the word ’suspense’ suggest that there really ought to be several different words for what is more like a concept cluster than a single concept. The Collins English dictionary gives three definitions: 1. apprehension about what is going to happen. . . 2. an uncertain cognitive state; "the matter remained in suspense for several years" . . . 3. excited anticipation of an approaching climax; "the play kept the audience in suspense" anticipation, ex- pectancy - an expectation. Gerrig and Bernardo (1994) suggest that reading fiction involves constantly looking for solutions to the plot-based dilemmas faced by the characters in a story world. One of the suggestions which come out of this work is that suspense is greater the lower the number of solutions to the hero’s current problem that can be found by the reader. Cheong and Young’s (2006) narrative generating system uses the idea that a reader’s suspense level depends on the number and type of solutions she can imagine in order to solve the problems facing the narrative’s preferred character. Generally, it seems that more overarching and precise definitions of suspense are wanting in order to connect some of the above approaches. The point of view we will assume is that the principles by which literary narratives are designed are obscured by the lack of sufficiently analytical concepts to define them. We will use as our starting point work on stories by Brewer and Lichtenstein (1981) which seems fruitful in that it proposes not only a view of suspense, but also of related narrative phenomena such as surprise and curiosity. Page 15 of 125 1
  21. 21. 2010 CRC PhD Student Conference 2.2 Brewer and Lichtenstein’s approach In Brewer and Lichtenstein (1981) propose that there are three major discourse structures which account for the enjoyment of a large number of stories: surprise, curiosity and suspense. For suspense, there must be an initiating event which could lead to significant consequences for one of the characters in the narrative. This event leads to the reader feeling concern about the outcome for this character, and if this state is maintained over time, then the reader will feel suspense. As Brewer and Lichtenstein say, often ‘additional discourse material is placed between the initiating event and the outcome event, to encourage the build up of suspense’ (Brewer and Lichtenstein, 1981, p.17). Much of the current work can be seen as an attempt to formalise and make robust the notions of narrative understanding that Brewer laid out. We will try to suggest a model of suspense which explains, for example, how the placing of additional material between the initiating event and the outcome event increases the suspense felt in a given narrative. We will also suggest ways in which curiosity and surprise could be formally linked to suspense. We also hope that our approach will be able to shed some light on the techniques for creating suspense presented in writer’s manuals. 3 The storybase 3.1 Event structure perception Our starting point for analysing story structure is a list of (verbally described) story events. Some recent studies (Speer, 2007) claim that people break narratives down into digestible chunks in this way. If this is the case, then there should expect to discover commonalities between different types of narrative (literature, film, storytelling) especially as regards phenomena such as suspense. One goal of this work is to discover just these commonalities. 3.2 Storybase : from which we can talk about variants of the ’same’ story. One of the key points that Brewer and Lichtenstein make is that the phenomena of suspense depends on the order in which information about the story is released, as well as on which information is released and which withheld. One might expect, following this account, that telling ‘the same story’ in two different ways might produce different levels of suspense. In order to be able to test different tellings of the same story, we define the notion of a STORYBASE. This should consist of a set of events, together with some constraints on the set. Any telling of the events which obeys these constraints should be recognised by most listeners as being ‘the same story’. We define four types of link between the members of the set of possible events: • Starting points, Event links, Causal constraints, Stopping points. The causal constraints can be positive or negative. They define, for example, which events need to have been told for others to now be able to be told. Our approach can be seen as a kind of specialised story-grammar for a particular story. The grammar generates ‘sentences’, and each ‘sentence’ is a different telling of the story. The approach is different to story schemas. We are not trying to encode information about the world at this stage, any story form is possible. With this grammar, we can generate potentially all of the possible tellings of a given story which are recognisably the same story, and in this way, we can test our heuristics for meta-effects such as suspense on a whole body of stories. 4 Inference 4.1 Inference types To model the inferential processes which go on when we listen to or read a story, or watch a film, we define three types of inference: 1. Inference of basic events from sensory input : a perceived action in the narrative together with an ‘event classifier module’ produces a list of ordered events. 2. Inferences about the current state of the story (or deductions). 3. Inferences about the future state of the story (or predictions). Page 16 of 125
  22. 22. 2010 CRC PhD Student Conference Clearly these inferential processes also rely on general knowledge about about the world or the story domain, and even about stories themselves. So, for each new story event we build up a set of inferences STORYSOFAR of these three types. At each new story event, new inferences are generated and old inferences rejected. There is a constant process of maintenance of the logical coherence of the set of inferences as the story is told. To model this formally, we create a set of ‘inferential triples’ of the form: “if X and Y then Z” or X.Y->Z, where X, Y, and Z are Deductions or Predictions. 5 Measuring suspense 5.1 A ‘suspense-grammar’ on top of the storybase To try to capture phenomena such as suspense, curiosity and surprise, we aim to create and test different algorithms which take as their input the generated story, together with the inferences generated by the triples mentioned above. A strong feature of this approach is that we can test our algorithms on a set of very closely related stories which have been generated automatically. 5.2 Modelling conflicting predictions Our current model of suspense is based on the existence of conflicting predictions with high salience. (This notion of the salience of a predicted conflict could be defined in terms of the degree to which whole sets of following predictions for the characters in the narrative are liable to change. For the moment, intuitively, it relates to how the whole story might ‘flow’ in a different direction.) For the story domain, we construct the set INCOMP of pairs of mutually conflicting predictions with a given salience: INCOMP = { (P1,NotP1,Salience1), (P2,NotP2,Salience2), . . . } We can now describe a method for modelling the conflicting predictions triggered by a narrative. If at time T, P1 and NotP1 are members of STORYSOFAR, then we have found two incompatible predictions in our ‘story-so-far’. 5.3 The predictive chain We need one further definition in order to be able to define our current suspense measure for a story. For a given prediction P1, we (recursively) define the ’prediction chain’ function C of P1: C(P1) is the set of all predicted events P such that P.y -> P’ where P’ is a member of C(P1) for some y. 5.4 Distributing salience as a rough heuristic for modelling suspense in a narrative Suppose we have a predicted conflict between predictionA and predictionB which has a salience of 10. In these circumstances, it would seem natural to ascribe the salience of 5 to each of the (at least) two predicted events predictionA and predictionB which produce the conflict. Now suppose that leading back from predictionA there is another predictionC that needs to be satisfied for the predictionA to occur. How do we spread out the salience of the conflict over these different predicted events ? 5.5 A ’thermodynamic’ heuristic for creating a suspense measure A predicted incompatibility as described above triggers the creation of CC(P1,P2,Z), the set of two causal chains C(P1) and C(P2) which lead up to these incompatible predictions. Now, we have : CC(P1,P2,Z) = C(P1) + C(P2) To determine our suspense heuristic, we first find the size L of CC(P1,P2,Z). And at each story step we define the suspense level S in relation to the conflicting predictions P1 and P2 as S = Z / L. Intuitively, one might say that the salience of the predicted incompatibility is ’spread over’ or distributed over the relevant predictions that lead up to it. We can call this a ‘thermodynamic’ model because it is as if the salience or ‘heat’ of one predicted conflicting moment is transmitted back down the predictive line to the present moment. All events which could have a bearing on any of the predictions in the chain are for this reason subject to extra attention. Page 17 of 125
  23. 23. 2010 CRC PhD Student Conference If the set of predictions stays the same over a series of story steps, and in a first approximation, we assume that the suspensefulness of a narrative is equivalent to the sum of the suspense level of each story step, then we can say that the narrative in question will have a total suspense level S-total relative to this particular predicted conflict of S-total = Z/L + Z/(L-1) + Z/(L-2) + . . . + Z/L as the number of predictions in CC(P1,P2,Z) decreases each time a prediction is either confirmed or annulled. To resume we can a working definition of suspense as follows: 5.6 Definition of suspense Definition : the suspense level of a narrative depends on the salience of predicted con- flicts between two or more possible outcomes and on the amount of story time that these predicted conflicts remain unresolved and ‘active’. From this definition of suspense we would expect two results: 1. the suspense level at a given story step will increase as the number of predictions necessary to be confirmed leading up to the conflict decreases, and 2. the way to maximise suspense in a narrative is for the narrative to ‘keep active’ predicted incompatibilities with a high salience over several story steps. In fact, this may be just how suspenseful narratives work. One might say, suspenseful narratives engineer a spreading of the salience of key moments backwards in time, thus maintaining a kind of tension over sufficiently long periods for emotional effects to build up in the spectator. 6 Summary We make two claims: 1. The notion of a storybase is a simple and powerful to generate variants of the same story. 2. Meta-effects of narrative can be tested by using formal algorithms on these story variants. These algorithms build on modelling of inferential processes and knowledge about the world. 7 References • Brewer, W. F. (1996). The nature of narrative suspense and the problem of rereading. In P. Vorderer, H. J. Wulff, and M. Friedrichsen (Eds.), Suspense: Conceptualizations, theoretical analyses, and empirical explorations. Mahwah, NJ: Lawrence Erlbaum Associates. 107-127. • Brewer, W.F., and Lichtenstein, E. H. (1981). Event schemas, story schemas, and story grammars. In J. Long and A. Baddeley (Eds.), Attention and Performance IX. Hillsdale, NJ: Lawrence Erlbaum Associates. 363-379. • Cheong, Y.G. and Young, R.M. 2006. A Computational Model of Narrative Generation for Suspense. In Computational Aesthetics: Artificial Intelligence Approaches to Beauty and Happiness: Papers from the 2006 AAAI Workshop, ed. Hugo Liu and Rada Mihalcea, Technical Report WS-06-04. American Association for Artificial Intelligence, Menlo Park, California, USA, pp. 8- 15. • Gerrig R.J., Bernardo A.B.I. Readers as problem-solvers in the experience of suspense (1994) Poetics, 22 (6), pp. 459- 472. • Speer, N. K., Zacks, J. M., & Reynolds, J. R. (2007). Human brain activity time-locked to narrative event boundaries. Psychological Science, 18, 449-455. Page 18 of 125
  24. 24. 2010 CRC PhD Student Conference Verifying Authentication Properties of C Security Protocol Code Using General Verifiers Fran¸ois Dupressoir c Supervisors Andy Gordon (MSR) Jan J¨rjens (TU Dortmund) u Bashar Nuseibeh (Open University) Department Computing Registration Full-Time Probation Passed 1 Introduction Directly verifying security protocol code could help prevent major security flaws in communication systems. C is usually used when implementing security soft- ware (e.g. OpenSSL, cryptlib, PolarSSL...) because it provides control over side-channels, performance, and portability all at once, along with being easy to call from a variety of other languages. But those strengths also make it hard to reason about, especially when dealing with high-level logical properties such as authentication. Verifying high-level code. The most advanced results on verifying imple- mentations of security protocols tackle high-level languages such as F#. Two main verification trends can be identified on high-level languages. The first one aims at soundly extracting models from the program code, and using a cryptography-specific tool such as ProVerif (e.g. fs2pv [BFGT06]) to verify that the extracted protocol model is secure with respect to a given attacker model. The second approach, on the other hand, aims at using general verification tools such as type systems and static analysis to verify security properties directly on the program code. Using general verification tools permits a user with less expert knowledge to verify a program, and also allows a more modular approach to verification, even in the context of security, as argued in [BFG10]. Verifying C code. But very few widely-used security-oriented programs are written in such high-level languages, and lower-level languages such as C are usually favoured. Several approaches have been proposed for analysing C secu- rity protocol code [GP05, ULF06, CD08], but we believe them unsatisfactory for several reasons: • memory-safety assumptions: all three rely on assuming memory-safety 1 Page 19 of 125
  25. 25. 2010 CRC PhD Student Conference properties,1 • trusted manual annotations: all three rely on a large amount of trusted manual work, • unsoundness: both [CD08] and [ULF06] make unsound abstractions and simplifications, which is often not acceptable in a security-criticial context, • scalability issues: [CD08] is limited to bounded, small in practice, numbers of parallel sessions, and we believe [GP05] is limited to small programs due to its whole-program analysis approach. 1.1 Goals Our goal is to provide a new approach to soundly verify Dolev-Yao security properties of real C code, with a minimal amount of unverified annotations and assumptions, so that it is accessible to non-experts. We do not aim at verifying implementations of encryption algorithms and other cryptographic operations, but their correct usage in secure communication protocols such as TLS. 2 Framework Previous approaches to verifying security properties of C programs did not de- fine attacker models at the level of the programming language, since they were based on extracting a more abstract model from the analysed C code (CSur and Aspier), or simply verified compliance of the program to a separate specification (as in Pistachio). However, to achieve our scalability goals, we choose to define an attacker model on C programs, that enables a modular verification of the code. To avoid issues related to the complex, and often very informal semantics of the C language, we use the F7 notion of a refined module (see [BFG10]). In F7, a refined module consists of an imported and an exported interface, contain- ing function declarations and predicate definitions, along with a piece of type- checked F# code. The main result states that a refined module with empty imported interface cannot go wrong, and careful use of assertions allows one to statically verify correspondence properties of the code. Composition results can also be used to combine existing refined modules whilst ensuring that their security properties are preserved. We define our attacker model on C programs by translating F7 interfaces into annotated C header files. The F7 notion of an opponent, and the corresponding security results, can then be transferred to C programs that implement an F7- translated header. The type-checking phase in F7 is, in the case of C programs, replaced by a verification phase, in our case using VCC. We trust that VCC is sound, and claim that verifying that a given C program correctly implements a given annotated C header entails that there exists an equivalent (in terms of attacks within our attacker model) F7 implementation of that same interface. 1 Which may sometimes be purposefully broken as a source of randomness. Page 20 of 125
  26. 26. 2010 CRC PhD Student Conference 3 Case Study We show how our approach can be used in practice to verify a simple implemen- tation of an authenticated Remote Procedure Call protocol, that authenticates the pair of communicating parties using a pre-shared key, and links requests and responses together. We show that different styles of C code can be verified using this approach, with varying levels of required annotations, very few of which are trusted by the verifier. We argue that a large part of the required annotations are memory-safety related and would be necessary to verify other properties of the C code, including to verify the memory-safety assumptions made by previous approaches. 4 Conclusion We define an attacker model for C code by interpreting verified C programs as F7 refined modules. We then describe a method to statically prove the impos- sibility of attacks against C code in this attacker model using VCC [CDH+ 09], a general C verifier. This approach does not rely on unverified memory-safety assumptions, and the amount of trusted annotations is minimal. We also believe it is as sound and scalable as the verifier that is used. Moreover, we believe our approach can be adapted for use with any contract-based C verifier, and could greatly benefit from the important recent developments in that area. References [BFG10] Karthikeyan Bhargavan, C´dric Fournet, and Andrew D. Gordon. e Modular verification of security protocol code by typing. In Proceed- ings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL ’10, pages 445—456, Madrid, Spain, 2010. [BFGT06] Karthikeyan Bhargavan, C´dric Fournet, Andrew D. Gordon, and e Stephen Tse. Verified interoperable implementations of security pro- tocols. In CSFW ’06: Proceedings of the 19th IEEE workshop on Computer Security Foundations, pages 139—-152, Washington, DC, USA, 2006. IEEE Computer Society. [CD08] Sagar Chaki and Anupam Datta. ASPIER: an automated framework for verifying security protocol implementations. Technical CMU- CyLab-08-012, CyLab, Carnegie Mellon University, 2008. [CDH+ 09] Ernie Cohen, Markus Dahlweid, Mark Hillebrand, Dirk Leinenbach, Michal Moskal, Thomas Santen, Wolfram Schulte, and Stephan To- bies. VCC: a practical system for verifying concurrent C. In Pro- ceedings of the 22nd International Conference on Theorem Prov- ing in Higher Order Logics, pages 23—42, Munich, Germany, 2009. Springer-Verlag. [GP05] Jean Goubault-Larrecq and Fabrice Parrennes. Cryptographic pro- tocol analysis on real C code. In Proceedings of the 6th International Page 21 of 125
  27. 27. 2010 CRC PhD Student Conference Conference on Verification, Model Checking and Abstract Interpre- tation (VMCAI’05), volume 3385 of Lecture Notes in Computer Sci- ence, page 363–379. Springer, 2005. [ULF06] Octavian Udrea, Cristian Lumezanu, and Jeffrey S Foster. Rule- Based static analysis of network protocol implementations. IN PRO- CEEDINGS OF THE 15TH USENIX SECURITY SYMPOSIUM, pages 193—208, 2006. Page 22 of 125
  28. 28. 2010 CRC PhD Student Conference Agile development and usability in practice: Work cultures of engagement Jennifer Ferreira Supervisors Helen Sharp Hugh Robinson Department/Institute Computing Status Fulltime Probation viva After Starting date February, 2008 Abstract. Combining usability and Agile development is a complex topic. My academic research, combined with my research into practice, suggests three perspectives from which the topic can be usefully examined. The first two (addressing focus and coordination issues) are typically the perspectives taken in the literature and are popular items for discussion. I propose that there is a third, largely unexplored perspective that requires attention, that of how developers and designers engage in the context of their work cultures. 1 Introduction Both disciplines are still in a state of uncertainty about how one relates to the other — in terms of whether they are addressing the same underlying issues, whether they belong to and should be recognised as one “process”, who takes the lead and who adjusts to whom. The complexity of the problem arises from practitioner and academic contributions to the literature, as well as the varying perspectives the contributors hold. Complexity further arises from the practical settings in which the problem plays out, settings characterised by different balances of power and different levels of influence the designers and developers may have on determining how they work. What is clear, is that the solutions proposed, follow the ways in which the problem is conceptualised. It certainly matters how the problem is conceptualised, as this reflects which issues are important enough to address and the ways to go about doing that. In light of this, we can unpick from the complexity three emerging strands of discussion that deal with usability in an agile domain. For the benefit of the following discussion, I am making the assumption that a developer constituency exists separately from a designer constituency. Further, that if questioned, a developer would not consider themselves doing the work of a designer and vice versa. Of course, this is not always the case in practice. I have encountered Agile teams with no dedicated usability person assigned to work with the team, where developers were addressing usability-related issues as part of their everyday work. This illustrates yet another layer of complexity associated with practice that must be acknowledged, but can not be adequately addressed within the limitations of this paper. 2 A question of focus In the first perspective, the combination of usability approaches with Agile approaches helps practitioners focus on important aspects of software development. While Agile approaches focus on creating working software, usability approaches focus on creating a usable design that may or may not be in the form of working software. A central concern of this perspective is how to support the weaknesses of one with the strengths of the other. Agile approaches are seen to lack an awareness of usability issues, with little guidance for how and when designers contribute to the process. Usability approaches are seen to lack a structured approach to transforming designs into working software and, therefore, little guidance on how developers are involved. Therefore, they are seen as complementary approaches that, used together, improve the outcome of the software development effort. This often serves as the motivation for combining Agile development and usability in the first place. We find examples in the literature that combine established Agile approaches, e.g., eXtreme Programming, or Page 23 of 125
  29. 29. 2010 CRC PhD Student Conference Scrum, with established design approaches, e.g., Usage-Centered Design [6], Usability Engineering [5]. We also find examples of well-known HCI techniques such as personas [1] and scenarios [3] being used on Agile projects. 3 A question of coordination The second perspective on how to bring usability and Agile development together is one where it is considered a problem of coordination. That is, the central concern is how to allow the designers and developers to carry out their individual tasks, and bring them together at the appropriate points. Designers require enough time at the outset of the project to perform user research and sketch out a coherent design. To fit with the time-boxed Agile cycles, usability techniques are often adapted to fit within shorter timescales. Advice is generally to have designers remain ahead of the developers, so that they have enough time to design for what is coming ahead and evaluate what has already been implemented. In the literature we find examples of process descriptions as a way of addressing this coordination issue. They provide a way to mesh the activities of both designers and developers, by specifying the tasks that need to be performed in a temporal sequence (e.g., [4]). 4 Work cultures of engagement The third perspective addresses practical settings and has received little attention so far. In this perspective, rather than concentrating on processes or rational plans that abstract away from the circumstances of the actions, the situatedness of the work of the developers and designers is emphasised. This perspective encompasses both of those discussed above, while acknowledging that issues of coordination and focus are inextricably linked with the setting in which practitioners work. That is, how the developers and designers coordinate their work and how focus is maintained, in practice is shaped and sustained by their work setting. With work culture I specifically mean the “set of solutions produced by a group of people to meet specific problems posed by the situation that they face in common” [2, p.64], in a work setting. If developers and designers are brought together by an organisation, they will be working together amid values and assumptions about the best way to get the work done — the manifestations of a work culture. I combine work cultures with engagement to bring the point across that how developers and designers engage with one another depends in essential ways on the embedded values and assumptions regarding their work and what is considered appropriate behaviour in their circumstances. My research into practice has provided evidence for how practical settings shape developers and designers engaging with one another. We find that developers and designers get the job done through their localised, contingent and purposeful actions that are not explained by the perspectives above. Further, the developers and designers can be embedded in the same work culture, such that they share values, assumptions and behaviours for getting the work done. But we have also encountered examples where developers and designers are in separate groups and embedded in distinct work cultures. Engaging in this sense requires that individuals step outside their group boundaries and figure out how to deal with each other on a daily basis — contending with very different values, assumptions and behaviours compared to their own. This is an important perspective to consider because of the implications for practice that it brings — highlighting the role of work culture, self-organisation and purposeful work. It is also a significant perspective, since we are unlikely to encounter teams in practice who are fully self-directed and independent of other teams, individuals or organisational influences. 5 Concluding remarks As we work through the problems that crossing disciplinary boundaries suggest, we simultaneously need an awareness of which conception of the problem is actually being addressed. In this paper I have identified a third perspective requiring attention, where we take account of the work settings in which the combination of Agile development and usability is played out. According to this perspective, it would be unrealistic to expect that one ideal approach would emerge and successfully translate to any other work setting. Instead, it shifts attention to the work cultures involved in usability and Agile development in practice. It shows how understanding and supporting the mechanisms of the work cultures that achieve engagement in that setting, contribute to understanding and supporting the mechanisms that enable usability in an agile domain. References 1. Haikara, J.: Usability in Agile Software Development: Extending the Interaction Design Process with Personas Approach . In: Concas, G., Damiani, E., Scotto, M., Succi, G. (eds.) Page 24 of 125
  30. 30. 2010 CRC PhD Student Conference Agile Processes in Software Engineering and Extreme Programming. LNCS, vol. 4536/2007, pp. 153–156. Springer, Berlin/Heidelberg (2007) 2. Vaughan, D.: The Challenger Launch Decision: Risky technology, culture and deviance at NASA. The University of Chicago Press, Chicago and London (1996) 3. Obendorf, H., Finck, M.: Scenario-based usability engineering techniques in agile development processes. In: CHI ’08 Extended Abstracts on Human Factors in Computing Systems (Florence, Italy, April 05 - 10, 2008), pp. 2159–2166. ACM, New York, NY (2008) 4. Sy, D.: Adapting usability investigations for Agile user-centered design. Journal of Usability Studies 2(3), 112–132 (2007) 5. Kane, D.: Finding a Place for Discount Usability Engineering in Agile Development: Throwing Down the Gauntlet. In: Proceedings of the Conference on Agile Development (June 25 - 28, 2003), pp. 40. IEEE Computer Society, Los Alamitos, CA (2003) 6. Patton, J.: Hitting the target: adding interaction design to agile software development. In: OOPSLA 2002 Practitioners Reports (Seattle, Washington, November 04 - 08, 2002), pp. 1-ff. ACM, New York, NY (2002) Page 25 of 125
  31. 31. 2010 CRC PhD Student Conference Model Driven Architecture of Large Distributed Hard Real Time Systems Michael A Giddings Supervisors Dr Pat Allen Dr Adrian Jackson Dr Jan Jürjens, Dr Blaine Price Department/Institute Department of Computing Status Part-time Probation viva Before Starting date 1 October 2008 1. Background Distributed Real-time Process Control Systems are notoriously difficult to develop. The problems are compounded where there are multiple customers and the design responsibility is split up between different companies based in different countries. The customers are typically users rather than developers and the domain expertise resides within organisations whose domain experts have little software expertise. Two types of Distributed real-time Process Control Systems are open loop systems and closed loop systems (with and without feedback). Typical examples are used for the display of sensor data and control of actuators based on sensor data. Typically systems contain a mixture of periodic and event driven processing with states changing much more slowly than individual periodic processing steps. In addition to the functional requirements, non functional requirements are also needed to describe the desired operation of the software system. A number of these requirements may be grouped together as performance requirements. Performance requirements are varied and depend on the particular system to which they refer. In early systems performance was managed late in the development process on a ‘fix it later’ basis. (Smith 1990). As software systems became more sophisticated it became necessary to manage performance issues as early as possible to avoid the cost impact of late detected performance failures. 2. The Problem The need for modelling performance for the early detection of performance failures is well established. (Smith 1990). Recent surveys have shown that the adoption of the Unified Modelling Language (UML) in software systems development remains low at 16% with no expected upturn. The use of trial and error methods in embedded system development remains at 25%. (Sanchez and Acitores 2009). Page 26 of 125
  32. 32. 2010 CRC PhD Student Conference A number of summary papers exist that list the performance assessment methods and tools. (Smith 2007), (Balsamo, Di Marco et al. 2004), (Koziolek 2009) and (Woodside, Franks et al. 2007). These identify performance assessment methods suitable for event driven systems, client/server systems, layered queuing networks and systems with shared resources. Fifteen performance approaches identified to combat the ‘fix-it-later’ approach have been summarised. (Balsamo, Di Marco et al. 2004). These methods apply to a broad range of software systems and performance requirements. In particular they cover shared resources (Hermanns, Herzog et al. 2002), client/servers (Huhn, Markl et al. 2009) and event driven systems (Staines 2006) (Distefano, Scarpa et al. 2010) and mainly focus on business systems. Each of these performance methods can contribute to the performance analysis of Distributed Real-time Process Control Systems but rely on system architecture and software design being wholly or partly complete. 3. Proposed Solution In this paper I propose modelling individual system elements, sensors, actuators, displays and communication systems as periodic processes associated with a statistical description of the errors and delays. Existing performance methods based on MARTE (OMG 2009) using the techniques described above can be used for individual elements to calculate performance. The proposed methodology, however, enables models to be developed early for systems which comprise individual processing elements, sensors, actuators, displays and controls linked by a bus structure prior to the development of UML models. System architects establish the components and component communications early in the system lifecycle. Tools based on SysML 1.1 (OMG 2008) provide a method of specifying the system architecture. These design decisions frequently occur prior to any detailed performance assessment. Early performance predictions enable performance requirements to be established for individual system elements with a greater confidence than the previous ‘fix-it-later’ approach. (Eeles 2009). It has been claimed (Lu, Halang et al. 2005; Woodside, Franks et al. 2007) that Model Driven Architecture (MDA) (OMG 2003) is able to aid in assessing performance. A periodic processing architecture may enable early assessment of performance by permitting loosely coupled functional elements to be used as building blocks of a system. A high level of abstraction and automatic translation between models can be achieved using functional elements. Platform independent models for the individual components of the system mixed with scheduling information for each component may enable the impact of functional changes and real performance to be assessed early in the development process. Models for individual elements can be combined taking into account that the iteration schedules for each element are not synchronised with each other. These models can be animated or performance calculated with established mathematical methods (Sinha 1994). One way that MDA may be used to provide early performance assessment is to develop a functional model similar to CoRE (Mullery 1979) alongside the UML (OMG 2003) models in the MDA Platform Independent Model. The functional model Page 27 of 125
  33. 33. 2010 CRC PhD Student Conference can then be developed by domain experts without any knowledge of software techniques. For central system computers it can also be used to identify classes and methods in the MDA Platform Independent Model by a simple semi-automatic process similar to the traditional noun and verb annunciation methods. It can be used to identify simple functional elements which can be implemented as part of a periodic iteration architecture. Animation of these functional elements at the requirements stage may be undertaken in a way which will reflect the actual performance of the computer. Non periodic processing elements, bus systems, sensors, actuators, displays and controls can be represented by abstractions based on an iteration schedule. This model can be used to specify the requirements for individual elements Connections between the independent functional elements which represent the notional data flow across a periodic system can be used to establish functional chains which can identify all the functional elements that relate to each specific end event. Each functional chain can then be analysed into a collection of simple sub-chains. Not all of which will have the same performance requirements when combined to meet the overall performance requirement. When each of the sub-chains has been allocated its own performance criteria individual functional elements can be appropriately scheduled within a scheduling plan with each element only being scheduled to run sufficiently frequently to meet the highest requirement of each sub-chain. This leads to a more efficient use of processing capacity than conventional periodic systems. This provides three opportunities to animate the overall system which should produce similar results. The first opportunity is to schedule algorithms defined within the definition of each functional element in the functional model associated with the MDA Platform Independent Model. The second opportunity is to animate the object oriented equivalent of the functional chain in the UML models in the MDA Platform Independent Model (PIM) for the central processing elements. These would combine sequence diagrams which represent the functional model functional elements and objects and attributes of objects to represent the notional data flow. These would be combined with the functional chains for the remaining system elements. The third opportunity is to replace the functional chains generated from the Platform PIM with implemented functional elements from the MDA Platform Specific Models PSMs. Each animation would use standard iteration architectures to execute each functional element in the right order at the correct moment in accordance with regular predictable scheduling tables. The iteration parameters can be generated in a form which can be applied to each animation opportunity and final implementation without modification. Functional chains can be extracted from the functional model and animated independently enabling full end to end models to be animated using modest computing resources. Page 28 of 125
  34. 34. 2010 CRC PhD Student Conference 4. Conclusion The proposed methodology enables performance to be animated or calculated early in the design process generating models automatically focused on sections of the system which relate to individual performance end events prior to architectural and software structures being finalised. 5. References Balsamo, S., A. Di Marco, et al. (2004). "Model-based performance prediction in software development: a survey." Software Engineering, IEEE Transactions on 30(5): 295-310. Distefano, S., M. Scarpa, et al. (2010). "From UML to Petri Nets: the PCM-Based Methodology." Software Engineering, IEEE Transactions on PP(99): 1-1. Eeles, P. C., Peter (2009). The process of Software Architecting, Addison Wesley Professional. Hermanns, H., U. Herzog, et al. (2002). "Process algebra for performance evaluation." Theoretical Computer Science 274(1-2): 43-87. Huhn, O., C. Markl, et al. (2009). "On the predictive performance of queueing network models for large-scale distributed transaction processing systems." Information Technology & Management 10(2/3): 135-149. Koziolek, H. (2009). "Performance evaluation of component-based software systems: A survey." Performance Evaluation In Press, Corrected Proof. Lu, S., W. A. Halang, et al. (2005). A component-based UML profile to model embedded real-time systems designed by the MDA approach. Embedded and Real-Time Computing Systems and Applications, 2005. Proceedings. 11th IEEE International Conference on. Mullery, G. P. (1979). CORE - a method for controlled requirement specification. Proceedings of the 4th international conference on Software engineering. Munich, Germany, IEEE Press. OMG. (2003). "MDA Guide Version 1.0.1 OMG/2003-06-01." from <>. OMG. (2003). "UML 1.X and 2.x Object Management Group." from OMG (2008). OMG Systems Modelling Language (SysML) 1.1. OMG (2009). "OMG Profile ‘UML Profile for MARTE’ 1.0." Sanchez, J. L. F. and G. M. Acitores (2009). Modelling and evaluating real-time software architectures. Reliable Software Technologies - Ada-Europe 2009. 14th Ada-Europe International Conference on Reliable Software Technologies, Brest, France, Springer Verlag. Sinha, N. K., Ed. (1994). Control Systems, New Age International. Smith, C. (1990). Perfomance Engineering of software systems, Addison Wesley. Smith, C. (2007). Introduction to Software Performance Engineering: Origins and Outstanding Problems. Formal Methods for Performance Evaluation: 395-428. Staines, T. S. (2006). Using a timed Petri net (TPN) to model a bank ATM. Engineering of Computer Based Systems, 2006. ECBS 2006. 13th Annual IEEE International Symposium and Workshop on. Woodside, M., G. Franks, et al. (2007). The Future of Software Performance Engineering. Future of Software Engineering, 2007. FOSE '07, Minneapolis, MN Page 29 of 125
  35. 35. 2010 CRC PhD Student Conference An Investigation Into Design Diagrams and Their Implementations Alan Hayes Supervisors Dr Pete Thomas Dr Neil Smith Dr Kevin Waugh Department/Institute Computing Department Status Part-time Probation viva After Starting date 1st October 2005 The broad theme of this research is concerned with the application of information technology tools and techniques to automatically generate formative feedback based upon a comparison of two separate, but related, artefacts. An artefact is defined as a mechanism through which a system is described. In the case of comparing two artefacts, both artefacts describe the same system but do so through the adoption of differing semantic and modelling constructs. For example, in the case of a student coursework submission, one artefact would be that of a student-submitted design diagram (using the syntax and semantics of UML class diagrams) and the second artefact would be that of the student-submitted accompanying implementation (using java syntax and semantics). Both artefacts represent the student’s solution to an assignment brief set by the tutor. The design diagram describes the solution using one set of semantic representations (UML class diagrams) whilst the implementation represents the same solution using an alternative set (Java source code). Both artefacts are describing the same system and represent a solution to the assignment brief. An alternative example would be that of a student submitting an ERD diagram with an accompanying SQL implementation. This research aims to identify the generic mechanisms needed for a tool to be able to compare two different, but related, artefacts and generate meaningful formative feedback based upon this comparison. A case study is presented that applies these components to the case of automatically generating formative assessment feedback to the students based upon their submission. The specific area of formative feedback being addresses is based upon a comparison between the submitted design and the accompanying implementation. Constituent components described within each artefact are considered to be consistent if, despite the differing modelling constructs, they describe features that are common to both artefacts. The design (in diagrammatic format) is viewed as prescribing the structure and function contained within the implementation, whilst the implementation (source code) is viewed as implementing the design whilst adhering to its specified structure and function. There are several major challenges and themes that feed into this issue. The first is how the consistency between a student-submitted design and its implementation can be measured in such a way that meaningful formative feedback could be generated. This involves being able to represent both components of the student submission in a form that facilitates their comparison. Thomas et al [2005] and Smith et al [2004] describe a method of reducing a student diagram into meaningful minimum components. Tselonis et al Page 30 of 125
  36. 36. 2010 CRC PhD Student Conference [2005] adopt a graphical representation mapping entities to nodes and relationships to arcs. Consequently, one component of this research addresses how the student submitted design and its source code representation can be reduced to their constituent meaningful components. The second challenge associated with this research addresses the problem of how to facilitate a meaningful comparison between these representations and how the output of a comparison can be utilised to produce meaningful feedback. This challenge is further complicated as it is known that the student submission will contain errors. Smith et al [2004] and Thomas et al [2005] identified that the student diagrams will contain data that is either missing or extraneous. Thomasson et al [2006] analysed the designs of novice undergraduate computer programmers and identified a range of typical errors found in the student design diagrams. Additionally, Bollojou et al [2006] analysed UML modelling errors made by novice analysts and have identified a range of typical semantic errors made. Some of these errors will propagate into the student implementation whilst some will not. This research investigates how such analysis and classifications can be used to support the development of a framework that facilitates the automation of the assessment process. This work will be complemented by an analysis of six data sets collated for this research. Each data set is comprised of a set of student diagrams and their accompanying implementations. It is anticipated that this work will be of interest to academic staff engaged in the teaching, and consequently assessment, of undergraduate computing programmes. It will also be of interest to academic staff considering issues surrounding the prevention of plagiarism. Additionally, it will be of interest to those engaged in the field of software engineering and in particular to those involved in the auditing of documentation and practice. References [1] Higgins C., Colin A., Gray G., Symeonidis P. and Tsintsifas A. 2005 Automated Assessment and Experiences of Teaching Programming. In Journal on Educational Resources in Computing (JERIC) Volume 5 Issue 3, September 2005. ACM Press [2] Thomasson B., Ratcliffe M. and Thomas L., 2005 Identifying Novice Difficulties in Object Oriented Design. In Proceedings of Information Technology in Computer Science Education (ITiCSE ’06), June 2006, Bologna, Italy. [3] Bolloju N. and Leung F. 2006 Assisting Novice Analysts in Developing Quality Conceptual Models with UML. In Communications of the ACM June 2006, Vol 49, No. 7, pp 108-112 [4] Tselonis C., Sargeant J. and Wood M. 2005 Diagram Matching for Human- Computer Collaborative Assessment. In Proceedings of the 9th International conference on Computer Assisted Assessment, 2005. Page 31 of 125
  37. 37. 2010 CRC PhD Student Conference [5] Smith N., Thomas, P. and Waugh K. (2004) Interpreting Imprecise Diagrams. In Proceedings of the Third International Conference in Theory and Applications of Diagrams. March 22-24, Cambridge, UK. Springer Lecture Notes in Computer Science, eds: Alan Blackwell, Kim Marriott, Atsushi Shimomnja, 2980, 239-241. ISBN 3-540-21268-X. [6] Thomas P., Waugh K. and Smith N., (2005) Experiments in the Automated Marking of ER-Diagrams. In Proceedings of 10th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2005) (Lisbon, Portugal, June 27-29, 2005). Page 32 of 125
  38. 38. 2010 CRC PhD Student Conference An Investigation into Interoperability of Data Between Software Packages used to Support the Design, Analysis and Visualisation of Low Carbon Buildings Robina Hetherington Supervisors Robin Laney Stephen Peake Department/Institute Computing Status Fulltime Probation viva Before Starting date January 2010 This paper outlines a preliminary study into the interoperability of building design and energy analysis software packages. It will form part of a larger study into how software can support the design of interesting and adventurous low carbon buildings. The work is interdisciplinary and is concerned with design, climate change and software engineering. Research Methodology The study will involve a blend of research methods. Firstly the key literature surrounding the study will be critically reviewed. A case study will look at the modelling of built form, with reflection upon the software and processes used. The model used in the case study will then be used to enable the analysis of data movement between software packages. Finally conclusions regarding the structures, hierarchies and relationships between interoperable languages used in the process will be drawn. This will inform the larger study into how software can support the design of interesting and adventurous low carbon buildings. Research questions: 1. What are the types of software used to generate building models and conduct the analysis of energy performance? 2. What is the process involved in the movement of data from design software to energy analysis software to enable the prediction of the energy demands of new buildings? 3. What are the potential limitations of current interoperable languages used to exchange data and visualise the built form? Context Software has an important role in tackling climate change, it is “a critical enabling technology” [1]. Software tools can be used to support decision making surrounding climate change in three ways; prediction of the medium to long term effects, formation and analysis of adaptation strategies and support of mitigation methods. This work falls into the later category, to reduce the sources of greenhouse gases through energy efficiency and the use of renewable energy sources [2]. Climate change is believed to be caused by increased anthropogenic emissions of green house gases. One of the major greenhouse gases is carbon dioxide. In the UK Page 33 of 125
  39. 39. 2010 CRC PhD Student Conference the Climate Change Act of 2008 has set legally binding targets to reduce the emission of carbon dioxide by 80% from 1990 levels by 2050 [3]. As buildings account for almost 50% of UK carbon dioxide emissions the necessary alteration of practices related to the construction and use of buildings will have a significant role in achieving these targets [4]. In 2007 the UK Government announced the intention that all new houses would be carbon neutral by 2016 in the “Building a Greener Future: policy statement”. This is to be achieved by progressive tightening of Building Regulations legislation over a number of years [4]. Consultations are currently taking place on the practicalities of legislating for public sector buildings and all new non- domestic buildings to be carbon neutral by 2018 and 2019 respectively [5]. The changes in praxis in the next 20-30 years facing the construction industry caused by this legislation are profound [6]. Software used in building modelling Architecture has gone through significant changes since the 1980s when CAD [Computer Aided Draughting/Design] was introduced. The use of software has significantly altered working practices and enabled imaginative and inspiring designs, sometimes using complex geometries only achievable through the use of advanced modelling and engineering computational techniques. However, the advances in digital design media have created a complex web of multiple types of software, interfaces, scripting languages and complex data models [7]. The types of software used by architects can be grouped into three main categories: CAD software that can be used to generate 2D or 3D visualizations of buildings. This type of software evolved from engineering and draughting practices, using command line techniques to input geometries. This software is mainly aimed at imitating paper based practices, with designs printed to either paper or pdf. Visualization software, generally used in the early design stages for generating high quality renderings of the project. BIM [Building Information Modelling] software has been a significant development in the last few years. BIM software contains the building geometry and spatial relationship of building elements in 3D. It can also hold geographic information, quantities and properties of building components, with each component as an ‘object’ recorded in a backend database. Building models of this type are key to the calculations now required to support zero carbon designs [8]. Examples of BIM software are Revit by Autodesk[9], and ArchiCAD by Graphisoft[10] and Bentley Systems [11] Energy analysis software Analysis software is used to perform calculations such as heat loss, solar gains, lighting, acoustics, etc. This type of analysis is usually carried out by a specialist engineer, often subsequent to the architectural design. The available tools are thus aimed at the expert engineer who have explicit knowledge to run and interpret the results of the simulation. This means that, until recent legislative changes, there was no need for holistic performance assessment to be integrated into design software [12]. Calculation of energy consumption requires a model of the proposed building to make the detailed estimates possible. Examples of expert tools that use models for the calculation are TRNSYS [13], IES Virtual Environment [14], EnergyPlus [15]. One tool that supports the architectural design process is Ecotect [16], which has a more intuitive graphical interface and support to conduct a performance analysis [12]. Page 34 of 125