Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Martin Chapman: Research Overview, 2017

59 views

Published on

King's College London

Published in: Science
  • Be the first to comment

  • Be the first to like this

Martin Chapman: Research Overview, 2017

  1. 1. Research Overview Provenance Themes Dr. Martin Chapman March 17, 2017 King’s College London martin.chapman@kcl.ac.uk 1
  2. 2. Overview Learning the language of error Playing Hide-And-Seek Storing and processing MT300s 2
  3. 3. Learning the language of error
  4. 4. Problem Problem: One of the assertions in a program fails, based upon given input, and you want to know how the sequence of method calls in your program might have had an impact on this. static void gl_read () { do { gl_insert(nondet_int ()); } while (nondet_int ()); } static void gl_insert(int value) { struct node *node = malloc(sizeof *node ); node ->value = value; list_add (&node ->linkage , &gl_list ); INIT_LIST_HEAD (&node ->nested ); } static inline void __list_add(struct list_head *new , struct list_head *prev , struct list_head *next) { next ->prev = new; new ->next = next; new ->prev = prev; prev ->next = new; } static void inspect(const struct list_head *head) { head = head ->prev; const struct node *node = list_entry(head , struct node , linkage ); assert(node ->nested.prev == &node ->nested ); for (head = head ->next; &node ->linkage != head; head = head ->next ); } static inline void __list_del(struct list_head *prev , struct list_head *next) { next ->prev = prev; prev ->next = next; } static inline void list_add(struct list_head *new , struct list_head *head) { __list_add(new , head , head ->next ); } int main () { gl_read (); inspect (& gl_list ); } 3
  5. 5. Problem Problem: One of the assertions in a program fails, based upon given input, and you want to know how the sequence of method calls in your program might have had an impact on this. static void gl_read () { do { gl_insert(nondet_int ()); } while (nondet_int ()); } static void gl_insert(int value) { struct node *node = malloc(sizeof *node ); node ->value = value; list_add (&node ->linkage , &gl_list ); INIT_LIST_HEAD (&node ->nested ); } static inline void __list_add(struct list_head *new , struct list_head *prev , struct list_head *next) { next ->prev = new; new ->next = next; new ->prev = prev; prev ->next = new; } static void inspect(const struct list_head *head) { head = head ->prev; const struct node *node = list_entry(head , struct node , linkage ); assert(node ->nested.prev == &node ->nested ); for (head = head ->next; &node ->linkage != head; head = head ->next ); } static inline void __list_del(struct list_head *prev , struct list_head *next) { next ->prev = prev; prev ->next = next; } static inline void list_add(struct list_head *new , struct list_head *head) { __list_add(new , head , head ->next ); } int main () { gl_read (); inspect (& gl_list ); } 3
  6. 6. Problem Problem: One of the assertions in a program fails, based upon given input, and you want to know how the sequence of method calls in your program might have had an impact on this. static void gl_read () { do { gl_insert(nondet_int ()); } while (nondet_int ()); } static void gl_insert(int value) { struct node *node = malloc(sizeof *node ); node ->value = value; list_add (&node ->linkage , &gl_list ); INIT_LIST_HEAD (&node ->nested ); } static inline void __list_add(struct list_head *new , struct list_head *prev , struct list_head *next) { next ->prev = new; new ->next = next; new ->prev = prev; prev ->next = new; } static void inspect(const struct list_head *head) { head = head ->prev; const struct node *node = list_entry(head , struct node , linkage ); assert(node ->nested.prev == &node ->nested ); for (head = head ->next; &node ->linkage != head; head = head ->next ); } static inline void __list_del(struct list_head *prev , struct list_head *next) { next ->prev = prev; prev ->next = next; } static inline void list_add(struct list_head *new , struct list_head *head) { __list_add(new , head , head ->next ); } int main () { gl_read (); inspect (& gl_list ); } Analysing large amounts of code to understand this can be difficult. 3
  7. 7. Solution: Learning the language of error Proposed solution: Summarise all the paths that lead to a failing program assertion as a DFA [Chapman et al., 2015]. 4
  8. 8. Solution: Learning the language of error Proposed solution: Summarise all the paths that lead to a failing program assertion as a DFA [Chapman et al., 2015]. D gl read gl insert listadd list add glinsert inspect Learn Assert 4
  9. 9. Solution: Learning the language of error Proposed solution: Summarise all the paths that lead to a failing program assertion as a DFA [Chapman et al., 2015]. D gl read gl insert listadd list add glinsert inspect Learn Assert Much easier to analyse. Provides an overview of program behaviours, some of which may be unexpected. 4
  10. 10. Solution: Learning the language of error Proposed solution: Summarise all the paths that lead to a failing program assertion as a DFA [Chapman et al., 2015]. D gl read gl insert listadd list add glinsert inspect Learn Assert Did we really want this method call loop in our program? Much easier to analyse. Provides an overview of program behaviours, some of which may be unexpected. 4
  11. 11. Implementation Paths are formed from software counterexamples (method calls that lead to a failing assertion in a program). 5
  12. 12. Implementation Paths are formed from software counterexamples (method calls that lead to a failing assertion in a program). Our software learns these counterexamples via the L* algorithm [Angluin, 1987] (where the oracle is a model checker). 5
  13. 13. Implementation Paths are formed from software counterexamples (method calls that lead to a failing assertion in a program). Our software learns these counterexamples via the L* algorithm [Angluin, 1987] (where the oracle is a model checker). Membership queries pertain to individual counterexamples, while conjecture queries pertain to full automata. 5
  14. 14. Implementation Paths are formed from software counterexamples (method calls that lead to a failing assertion in a program). Our software learns these counterexamples via the L* algorithm [Angluin, 1987] (where the oracle is a model checker). Membership queries pertain to individual counterexamples, while conjecture queries pertain to full automata. Supported by a Google faculty research award. 5
  15. 15. Case study: Automatic merging Unexpected behaviours are particularly prevalent when code is automatically merged: main { ... functionA(); functionB(); ...} (a) Source main { ... functionA(); ... functionZ();} functionZ() { functionB();} (b) Branch A main { ... functionA(); functionB(); functionC(); ... } (c) Branch B main { ... functionA(); ... functionZ(); } functionZ() { functionB(); functionC(); ...} (d) Merged 6
  16. 16. Capitalising on automata representation (1) Our software uses an automaton representation to draw the developer’s attention to the changes introduced by the merge. 7
  17. 17. Capitalising on automata representation (1) Our software uses an automaton representation to draw the developer’s attention to the changes introduced by the merge. First we generate three automata: Branch A B1 Merged Code P Branch B B2 Automaton A1 Automaton A2Automaton AMerged 7
  18. 18. Capitalising on automata representation (1) Our software uses an automaton representation to draw the developer’s attention to the changes introduced by the merge. First we generate three automata: Branch A B1 Merged Code P Branch B B2 Automaton A1 Automaton A2Automaton AMerged We then compute the following: AMerged A1 and AMerged A2 in order to show the new behaviours. 7
  19. 19. Capitalising on automata representation (2) D Z C Learnassert Figure 1: AMerged A1 or behavior not in Branch A D Z BC Learnassert Figure 2: AMerged A2 or behavior not in Branch Ba a Subtracting the union of A1 and A2 (common behaviour) would also allow us to summarise all the new behaviour introduced by the merge. 8
  20. 20. Capitalising on automata representation (3) Why an automaton? 9
  21. 21. Capitalising on automata representation (3) Why an automaton? 1. Processing the source code directly in order to achieve a similar representation is likely to be inefficient (operations on automata are well established). 9
  22. 22. Capitalising on automata representation (3) Why an automaton? 1. Processing the source code directly in order to achieve a similar representation is likely to be inefficient (operations on automata are well established). 2. The automata representation is highly intelligible. 9
  23. 23. Playing Hide-And-Seek
  24. 24. Overview (1) Problem: Network attacks are becoming more frequent. 10
  25. 25. Overview (1) Problem: Network attacks are becoming more frequent. Potential solution: Construct formal decision making models (e.g. game theoretic frameworks) that capture network security scenarios in order to aid automated response (in respect of automatic processing and solution, etc.). 10
  26. 26. Overview (1) Problem: Network attacks are becoming more frequent. Potential solution: Construct formal decision making models (e.g. game theoretic frameworks) that capture network security scenarios in order to aid automated response (in respect of automatic processing and solution, etc.). An interesting class of network security models: network security games (NSGs). • Typically consider the interactions between an attacker and a defender. 10
  27. 27. Overview (1) Problem: Network attacks are becoming more frequent. Potential solution: Construct formal decision making models (e.g. game theoretic frameworks) that capture network security scenarios in order to aid automated response (in respect of automatic processing and solution, etc.). An interesting class of network security models: network security games (NSGs). • Typically consider the interactions between an attacker and a defender. A common approach to deriving an NSG model is to apply existing types of games to unexplored network security problems. 10
  28. 28. Overview (2) Unexplored network security problem: Multiple node attacks (e.g. botnets and attack pivots). 11
  29. 29. Overview (2) Unexplored network security problem: Multiple node attacks (e.g. botnets and attack pivots). How do we link multiple node attacks to an existing type of game? 11
  30. 30. Overview (2) Unexplored network security problem: Multiple node attacks (e.g. botnets and attack pivots). How do we link multiple node attacks to an existing type of game? The link: Multiple node attacks exhibit the two-sided search problem (looking for something that does not want to be found; the bots in a botnet (perspective of defender), or hidden, sensitive resources (perspective of attacker)) with multiple hidden entities. 11
  31. 31. Overview (3) Search games are designed to model and investigate the two-sided search problem, as interactions between a hider and a seeker. 12
  32. 32. Overview (3) Search games are designed to model and investigate the two-sided search problem, as interactions between a hider and a seeker. Hide-and-seek games, a subset of search games, are designed to do this for multiple hidden objects. 12
  33. 33. Overview (3) Search games are designed to model and investigate the two-sided search problem, as interactions between a hider and a seeker. Hide-and-seek games, a subset of search games, are designed to do this for multiple hidden objects. Initial proposal: It is logical to study hide-and-seek games in order to study multiple node attacks [Chapman et al., 2014]. • The hider is the defender, and the seeker is the attacker, or vice-versa. 12
  34. 34. Hide-And-Seek Games Different permutations on same basic model. The permutation of interest to us: • Two competing players; the hider and the seeker • A search space; for our purposes, a network graph • Hidden objects to be concealed on the network • Some cost to seeker for undertaking a search; the hider is rewarded in an inverse amount. Different strategies are explored for both the hider and the seeker. 13
  35. 35. Hide-And-Seek Games Different permutations on same basic model. The permutation of interest to us: • Two competing players; the hider and the seeker • A search space; for our purposes, a network graph • Hidden objects to be concealed on the network • Some cost to seeker for undertaking a search; the hider is rewarded in an inverse amount. Different strategies are explored for both the hider and the seeker. This model is simple, but already promising in what it can capture from a multiple node attack. 13
  36. 36. Hide-And-Seek Games Different permutations on same basic model. The permutation of interest to us: • Two competing players; the hider and the seeker • A search space; for our purposes, a network graph • Hidden objects to be concealed on the network • Some cost to seeker for undertaking a search; the hider is rewarded in an inverse amount. Different strategies are explored for both the hider and the seeker. This model is simple, but already promising in what it can capture from a multiple node attack. Richer variants to the model are natural, why aren’t they explored? ‘Complexity’. 13
  37. 37. Methodology We increase the richness of the model, and thus what it can capture of the security domain (e.g. timesteps, repeated interactions). We compensate for any increase in complexity by using an Empirical Game Theoretical Analysis (EGTA) approach to estimate the payoff values associated with different strategies by realising computational representations of them, and evaluating their performance in simulation. 14
  38. 38. Methodology We increase the richness of the model, and thus what it can capture of the security domain (e.g. timesteps, repeated interactions). We compensate for any increase in complexity by using an Empirical Game Theoretical Analysis (EGTA) approach to estimate the payoff values associated with different strategies by realising computational representations of them, and evaluating their performance in simulation. • Also allowed us to contribute a computational platform, which can be used as the basis for Distributed Research Games (more at cyberhands.co.uk) 14
  39. 39. Methodology We increase the richness of the model, and thus what it can capture of the security domain (e.g. timesteps, repeated interactions). We compensate for any increase in complexity by using an Empirical Game Theoretical Analysis (EGTA) approach to estimate the payoff values associated with different strategies by realising computational representations of them, and evaluating their performance in simulation. • Also allowed us to contribute a computational platform, which can be used as the basis for Distributed Research Games (more at cyberhands.co.uk) The performance of different strategies provides the basis for heuristics that can be applied to real security applications. 14
  40. 40. Results (1) Multiple interaction game: the same attacker (hider) and defender (seeker) meet each other multiple times. 15
  41. 41. Results (1) Multiple interaction game: the same attacker (hider) and defender (seeker) meet each other multiple times. Natural for the defender to keep data on the actions of an attacker, to help plan future strategies, by observing how the attacker interacts with the environment (e.g. where objects are hidden). 15
  42. 42. Results (1) Multiple interaction game: the same attacker (hider) and defender (seeker) meet each other multiple times. Natural for the defender to keep data on the actions of an attacker, to help plan future strategies, by observing how the attacker interacts with the environment (e.g. where objects are hidden). Therefore, natural for attacker to attempt to manipulate this data (i.e. switch the source of the data from the environment to themselves). 15
  43. 43. Results (1) Multiple interaction game: the same attacker (hider) and defender (seeker) meet each other multiple times. Natural for the defender to keep data on the actions of an attacker, to help plan future strategies, by observing how the attacker interacts with the environment (e.g. where objects are hidden). Therefore, natural for attacker to attempt to manipulate this data (i.e. switch the source of the data from the environment to themselves). Finding: deceptive strategies are not effective if the defender is sophisticated in respect of determining the source of data (i.e determining when manipulation is being attempted). 15
  44. 44. Results (2) 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 hD eceptive hRandom Set Payoff Strategy sHighProbability *** *** 16
  45. 45. Storing and processing MT300s
  46. 46. Storing and processing MT300s (1) Brief: “To design and build a distributed ledger POC system to process and store proprietary messages for inter- subsidiary forex transactions (MT300s) internal to a major Fortune 500 financial institution.” • Taking pairs of messages about forex transactions (e.g. £→ $; $ → £), and storing them on the ‘blockchain’. Distributed ledger: A generalised term for a blockchain, emphasising that it’s not only currency exchanges that can be stored in this sequential, secure and replicated way. 17
  47. 47. Storing and processing MT300s (2) Why? Intermediate message processors add time and money. Cynically: become familiar with the technology that may one day supplant them. Output. A platform that: • Integrates a wider range of different technologies to achieve its aim (e.g. BigChainDB, ErisDB (Permissioned chains based on Ethereum / EVM) + Tendermint)). • Focuses on scalability and throughput. Research into inter-chain interaction for processing and storing data (e.g. using a separate chain to store filtered transactions). 18
  48. 48. Provenance Themes A recurring theme of both conceptual provenance, and data provenance, in my work: 1 19
  49. 49. Provenance Themes A recurring theme of both conceptual provenance, and data provenance, in my work: • Learning the language of error: Understanding the functions that have had an impact on input data, using a graph based representation, and how this has lead to an error. 1 19
  50. 50. Provenance Themes A recurring theme of both conceptual provenance, and data provenance, in my work: • Learning the language of error: Understanding the functions that have had an impact on input data, using a graph based representation, and how this has lead to an error. • Playing Hide-And-Seek: Understanding the origin of data in order to make strategic decisions. 1 19
  51. 51. Provenance Themes A recurring theme of both conceptual provenance, and data provenance, in my work: • Learning the language of error: Understanding the functions that have had an impact on input data, using a graph based representation, and how this has lead to an error. • Playing Hide-And-Seek: Understanding the origin of data in order to make strategic decisions. • MT300 Processing: Using a distributed ledger (blockchain) to provide a secure historic record of all the actions involving an entity1. 1 Lots of research to be done at the intersection here! 19
  52. 52. Summary (1) Experience of, and achievements as a part of, projects that require not only good development skills, but also research capabilities. Strong programming ability. Wide range of experience working with different systems, some large in scale, or designed to be scalable. In particular systems that have required me to consider how to facilitate communication between heterogeneous entities (e.g. learn tool, HANDS platform, distributed ledger projects). Ph.D. with a focus on game theory, artificial intelligence, and elements of learning. 20
  53. 53. Summary (2) Themes of provenance, and graph-based representation, throughout work. Additional experience as a teaching academic staff member at King’s: significant teaching responsibilities, in addition to administrative and pastoral responsibilities. Some system development as a part of this role. 21
  54. 54. References Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and computation, 75(2):87–106. Chapman, M., Chockler, H., Kesseli, P., Kroening, D., Strichman, O., and Tautschnig, M. (2015). Learning the language of error. In International Symposium on Automated Technology for Verification and Analysis, pages 114–130. Springer. Chapman, M., Tyson, G., McBurney, P., Luck, M., and Parsons, S. (2014). Playing hide-and-seek: an abstract game for cyber security. In Proceedings of the 1st International Workshop on Agents and CyberSecurity, page 3. ACM. 22

×