SE and AI: A Two-way Street
26th
May2013
John A Clark
Dept. of Computer Science
University of York
York, UK
Why am I here? Why are you here?
 Aim to inspire progress in software engineering, in artificial
intelligence, and at the interface between the two disciplines.
 Some aspirations/stretch goals for the next 20 years.
 Some are clear.
 Some are rather loose.
 Some may be met fully.
 Others are harder, but in trying to meet them….
 Something Good Will Happen.
 Target the SE and AI communities, both individually and to inspire
collaboration.
 Many have roots in practical applications but satisfying them may
need theoretical advances in both disciplines.
Deviation!
 Inputs from Simon M Poulding, Mark Harman, Edmund Burke and Xin
Yao (my DAASE colleagues).
 It is better to travel than to arrive.
 The map is not the terrain.
 …..
 The talk is not the paper!
 I’ve kept many of the challenges but have thrown in more
observational material and some questions.
 KEYNOTE NIRVANA:
 Get to say things that would never get past referees!
 This is a health warning.
The challenge of proof automation:
striking at the heart of rigorous
software engineering
Moore’s Law for Formal Reasoning for
Software
 Rigorous software development has a long history in CS
 Turing’s 1949 has a proof of correctness
of a program with two nested loops and outlines
a more general proof approach.
 1960s saw classic works by Floyd and Hoare: {P} Q {R}
 1970s and 1980s: VDM, Z, and B, CSP and CCS.
 Tool support has matured, including automated reasoning.
 Tools generate “verification conditions” that must be proved for the software to be
demonstrably correct.
 Most are dispatched automatically but the remaining case cases require manual
and expensive proof effort.
 Technological improvements here would greatly facilitate take-up. This brings us to
our first challenge.
 Challenge. Demonstrate for 20 years a 20% year-on-year reduction in the
residual manual proof effort that must be expended to produce formally
verified systems.
Moore’s Law for Formal Reasoning for
Software
 We seek a software engineering proof equivalent of Moore’s Law. This is
exponentially (geometrically) ambitious and persistently feasible.
 “Classic AI” and undoubtedly “Classic CS” in the service of software
engineering.
 NOTA BENE:
 In many successful toolsets developers build powerful domain specific theories for
their provers.
 Obvious example here is ASTREE toolset (supporting AIRBUS C program
analysis)
 This has an important place in any way forward, but we observe also that current
theorem proving technology seems to draw little on the technologies and ideas that
AI has to offer (e.g. a raft of adaptive learning techniques).
 Provision for better tools:
 Automated proof tactic development via large scale HPC experimentation?
 Computer crowd sourcing?
 Data mining of proof strategies and understanding their applicability?
 What do people find difficult?
Impact Pragmatics
 Take a REAL SYSTEM that has been subject to formal
development and use that as a case study.
 Tokeneer. A security system Development using Z and Spark
Ada
 Part of the motivation is that EVERY TOOL has its own
idiosyncrasies/weaknesses and patterns of them.
 Perfectly reasonable to expect AI to detect them and for tool
environmental improvements to be gained.
 There are real opportunities to do some work that will grab
attention.
Learning form Software Engineering
 Software engineering knows the power of domain specificity
or focus more generally. There is an age-old tension
between generality/expressiveness and tractability.
 Witness tools that resolve issues of:
 Freedom from deadlock (e.g. via model checking)
 Exception-freeness (e.g. via abstract interpretation).
 Latter may still suffer false positive issues.
 THEY HAVE WELL DEFINED NARROW TARGETS AND
THEY AIM TO DELIVER ON THEM.
 YOU DON’T HAVE TO DO FULL FORMAL DEVELOPMENT.
 SOLVING SMALL PROBLEMS WELL IS GOOD
Software Engineering Helping AI to Help
Software Engineering: Ratcliff, White and Clark.
Here we use evolutionary
computation to evolve candidate
invariants from trace data – i.e.
what Daikon does.
Generates thousands!!!!!!!
Not all all interesting.
But see which invariants are
broken by mutant programs.
Some are very special to the
original program. Those are
INTERESTING
The challenge of trustable real-
time ai
trustable real-time ai
 Modern critical systems (e.g. those with safety implications) will increasingly
use AI to deliver the required services.
 Prototype driverless cars using image
processing techniques to detect humans avoid collision.
 Common “the non-determinism of many AI
algorithms makes their application in critical systems
difficult”.
 It is the inability to provide guaranteed envelopes of system behavior that is
problematic; stochastic algorithms, for example, would be fine, provided you could
rigorously argue that their behavior is satisfactorily bounded.
 Functional and non-functional correctness are both relevant here. (Dealing with
non-functional performance and resource trade-offs forms an important component
DAASE project, most typically as part of the collaborative work on adaptive
automated software engineering.
 Challenge. Demonstrate across a range of fundamental AI algorithms formal
machine assisted proofs of correctness and scientifically justified
predictive models of functionality-timing-memory-power-other tradeoffs.
Principle
people and computers are not the
same
Understand it, live with it, and
then embrace it
Or…
Vive la (les) Difference(s)!
Hard for humans
 Humans do some things well, and some things badly, and some
things not at all.
 Forget the doom and gloom. A lot of software works and works
acceptably well. This is largely due to human efforts. We humans
actually can do a fair job….But….
 Ask a human to write an image classifier that distinguishes pictures
containing cats from pictures containing dogs. Rather hard and
standard specification and refinement techniques are not much use.
 Furthermore, ask them to write such a binary classifier that works only
80% of the time. Che?
 It’s just not the sort of thing we do well at all.
 But why should we care?
Vive la (les) Difference(s)
Making n-version programming work!
 Critical systems hardware: majority voting of a 2-out-of-3 architecture allows
continued operation in the presence of a single hardware fault.
 The software variant of this idea is more controversial It is questionable
whether independence holds between developing teams and teams work from
the same, possibly flawed, specification.
Prog 1
Prog 1
Prog 1
Prog 1
Prog 2
Prog 3
Majority Vote
on Output
Majority Vote
on Output
Input goes to
all processors
Input goes to
all processors
Vive la (les) Difference(s)
Making n-version programming work!
 Automated programming may actually make the idea more
palatable.
 Programming teams may make the same assumptions and the
same mistakes
 But automated program discovery techniques can give sets of
programs with complementary weaknesses.
 Only need the majority to be correct on any training and
subsequent example.
 Ensemble based approaches…
 Challenge. Demonstrate a credible approach to N-version automated
programming that is scientifically grounded and capable of satisfying
assurance requirements of appropriate regulatory authorities.
 Suitable applications will have to be identified. This challenge has
roots in both SE and AI.
Vive la (les) similitude(s)
Making computers like humans!
 Humans are expensive and get bored very quickly.
 Great need for human testers in many systems.
 Long-standing quest. To create a system that cannot be
distinguished from humans.
 For most people the challenge was for humans to create such a
system
 Challenge. Bring AI to bear to create high performing proxies for
humans for specific purposes?
 What we really need is automated human compilation. (Humans are
programs too by the way.)
 But this should also allow for the “dumb user” – the one who does
something that screws the system up in an unanticipated way.
 Abstractly, this reduces to modeling of traces/sequences – what is the
range of approaches for this?
trustable real-time ai
 Finally, as part of our drive for engineering trusted systems, we envisage
significant exchange of ideas on software testing.
 Challenge. Draw on the testing expertise of SE and AI to develop a
credible and scientific basis for the testing of complex systems.
 We envisage further uses of AI of for stress testing of systems (but the
scientific justification for such testing may be some way off).
 In addition, enhanced fault based (e.g. mutation) testing will likely play an
important part in testing systems of systems based on AI technology (e.g.
agent based systems).
The Challenge of coming to terms
with resource abundance and
resource constraints
or
The sky’s (cloud’s) the limit
And
Just how low can you go?
Resource availability: Aunt Ada’s
Dividend Challenge
 Extraordinary computational power. The ‘cloud’ (however
constituted) is very much the topic of the day.
 One of our most computationally-minded august relatives, Aunt
Ada, has now retired from programming and has invested
deeply and successfully in cloud technology.
 Each year she receives a dividend of 1 billion processor
hours (with each processor clocking at approximately 1
GHz).
 She is free to use them as she sees fit, but they have to be
spent within one year. She wishes to support speculative
research in SE and AI.
 Challenge. Identify the problem from SE, from AI, or their
fusion that provides the best use of such resources.
The guilt trip power diet challenge
 Ever-increasing power consumption is a serious concern: some highly
developed societies now live in fear of power outages.
 GUILT TRIPPING: We aim to encourage and celebrate power-frugality with a
series of challenges.
 Challenge. For your favourite application or algorithm from SE, from AI, or their
fusion, demonstrate a year on year power consumption reduction of 20% for the
next 20 years.
 Challenge. (The 1 J Diet Challenge.) You are given 1 Joule of energy. Identify the
most ambitious task that can be completed using no more than 1 J.
 For the benefit of the less militantly frugal, we offer the corresponding 10 J
and 100 J Challenges. The above is intended as a playful attempt to spark
efforts in the area of low power functionality.
 The challenge may morph from real power to a more abstract model of power,
e.g. to a virtual mapping from instructions to power consumed.
 Yes, you’ll have to make hardware assumptions/have some common
computation al model.
The dark side
Principle: embracing the dark side
of ai can be more fun and highly
productive
challenge: to fully realise the
destructive capabilities of ai
Thinking within the box
 Habitual to recommend THINK OUTSIDE THE BOX . In a
sense this is nonsense. Aim is not to impose unnecessary
constraints due to the mental baggage (assumptions,
favourite techniques etc.). THINK WITHIN THE BOX but
do so in a way that gives a result we actually find
appealing.
Self-constructed box The real box
Thinking within the box: stressing
systems
 Best is best but very personal. Lots of AI (and indeed OR)
related research in the optimisation arena.
 Provided we can engineer feedback we can optimise for what
are usually regarded as NEGATIVE performance or other
criteria – STRESS TEST THE SYSTEM.
 Already see examples – e.g. growing worst case execution
times (various)
 But “stress” and “systems” are flexible beasts:
 Search based software testing – grow tests data that breaks
predicates (module pre-conditions, post conditions)
 Push non-functional properties to their limits
 Push predicates (a la Daikon etc.) to their limits by aggressive test
data generation). Attacking products of AI with AI in order to
strengthen them!
The Challenge of the New
Computing
New Will Be Big
Teleportation
 Quantum Information Processing (QIP) and Quantum Computing (QC) offers
us a radically different computational model and capability.
 Star Trek is with us now (sort of)!. Here’s Brassard’s TELEPORTATION
 Here’s an evolved one form Yabuki & Iba
Grover’s Algorithm
 Quantum Information Processing (QIP) and Quantum
Computing (QC) offers us a radically different computational
model and capability.
 Here is Grover’s Search (a fundamental building block of QC) –
allows you to search a database of size 2N
in order of 2N/2
.
 Spector et al. evolved a GS circuit for two qubits
Shor’s Quantum Fourier Transform
 Here a Quantum Fourier Transform (Shor’s fundamental
building block) – this is what enables you to break
factorization in polynomial time.
Massey et al Quantum Fourier
Transform Generation
New Computing
 If we were being cruel we could say that the AI community
(including me!) is capable of re-discovering what the physicists
have already discovered.
 But there are actually very few genuinely different quantum
algorithms around. A real opportunity….
 Challenge. Using AI techniques generate new quantum
algorithms to solve problems of acknowledged importance.
 Question: what does the new computing offer current
software engineering in terms of verification capability?
Dynamic Adaptive Automated Software
Engineering
 A major EPSRC “programme grant” (around £6.7m) over four sites:
UCL (Mark Harman PI), Birmingham (Xin Yao), Stirling (Edmund
Burke) and York (John Clark - or me).
 Major focus on automation and adaptivity (off-line and on-line)
 Will have to face many of the problems discussed at this workshop:
 Squishiness (in many forms)!
 Uncertainty and its resolution.
 Pareto and other MO approaches and their applicability.
 Ascertaining the limits of machine learning and what can be
justified/reasoned about.
Acknowledgements
 Sponsors
 DAASE: Dynamic Adaptive Automated Software
Engineering. EPSRC grant EP/J017515
 The Birth, Life and Death of Semantic Mutants
EPSRC grant EP/G043604/1
 Many thanks to: Simon M Poulding, Mark Harman,
Edmund Burke and Xin Yao

SE and AI: a two-way street

  • 1.
    SE and AI:A Two-way Street 26th May2013 John A Clark Dept. of Computer Science University of York York, UK
  • 2.
    Why am Ihere? Why are you here?  Aim to inspire progress in software engineering, in artificial intelligence, and at the interface between the two disciplines.  Some aspirations/stretch goals for the next 20 years.  Some are clear.  Some are rather loose.  Some may be met fully.  Others are harder, but in trying to meet them….  Something Good Will Happen.  Target the SE and AI communities, both individually and to inspire collaboration.  Many have roots in practical applications but satisfying them may need theoretical advances in both disciplines.
  • 3.
    Deviation!  Inputs fromSimon M Poulding, Mark Harman, Edmund Burke and Xin Yao (my DAASE colleagues).  It is better to travel than to arrive.  The map is not the terrain.  …..  The talk is not the paper!  I’ve kept many of the challenges but have thrown in more observational material and some questions.  KEYNOTE NIRVANA:  Get to say things that would never get past referees!  This is a health warning.
  • 4.
    The challenge ofproof automation: striking at the heart of rigorous software engineering
  • 5.
    Moore’s Law forFormal Reasoning for Software  Rigorous software development has a long history in CS  Turing’s 1949 has a proof of correctness of a program with two nested loops and outlines a more general proof approach.  1960s saw classic works by Floyd and Hoare: {P} Q {R}  1970s and 1980s: VDM, Z, and B, CSP and CCS.  Tool support has matured, including automated reasoning.  Tools generate “verification conditions” that must be proved for the software to be demonstrably correct.  Most are dispatched automatically but the remaining case cases require manual and expensive proof effort.  Technological improvements here would greatly facilitate take-up. This brings us to our first challenge.  Challenge. Demonstrate for 20 years a 20% year-on-year reduction in the residual manual proof effort that must be expended to produce formally verified systems.
  • 6.
    Moore’s Law forFormal Reasoning for Software  We seek a software engineering proof equivalent of Moore’s Law. This is exponentially (geometrically) ambitious and persistently feasible.  “Classic AI” and undoubtedly “Classic CS” in the service of software engineering.  NOTA BENE:  In many successful toolsets developers build powerful domain specific theories for their provers.  Obvious example here is ASTREE toolset (supporting AIRBUS C program analysis)  This has an important place in any way forward, but we observe also that current theorem proving technology seems to draw little on the technologies and ideas that AI has to offer (e.g. a raft of adaptive learning techniques).  Provision for better tools:  Automated proof tactic development via large scale HPC experimentation?  Computer crowd sourcing?  Data mining of proof strategies and understanding their applicability?  What do people find difficult?
  • 7.
    Impact Pragmatics  Takea REAL SYSTEM that has been subject to formal development and use that as a case study.  Tokeneer. A security system Development using Z and Spark Ada  Part of the motivation is that EVERY TOOL has its own idiosyncrasies/weaknesses and patterns of them.  Perfectly reasonable to expect AI to detect them and for tool environmental improvements to be gained.  There are real opportunities to do some work that will grab attention.
  • 8.
    Learning form SoftwareEngineering  Software engineering knows the power of domain specificity or focus more generally. There is an age-old tension between generality/expressiveness and tractability.  Witness tools that resolve issues of:  Freedom from deadlock (e.g. via model checking)  Exception-freeness (e.g. via abstract interpretation).  Latter may still suffer false positive issues.  THEY HAVE WELL DEFINED NARROW TARGETS AND THEY AIM TO DELIVER ON THEM.  YOU DON’T HAVE TO DO FULL FORMAL DEVELOPMENT.  SOLVING SMALL PROBLEMS WELL IS GOOD
  • 9.
    Software Engineering HelpingAI to Help Software Engineering: Ratcliff, White and Clark. Here we use evolutionary computation to evolve candidate invariants from trace data – i.e. what Daikon does. Generates thousands!!!!!!! Not all all interesting. But see which invariants are broken by mutant programs. Some are very special to the original program. Those are INTERESTING
  • 10.
    The challenge oftrustable real- time ai
  • 11.
    trustable real-time ai Modern critical systems (e.g. those with safety implications) will increasingly use AI to deliver the required services.  Prototype driverless cars using image processing techniques to detect humans avoid collision.  Common “the non-determinism of many AI algorithms makes their application in critical systems difficult”.  It is the inability to provide guaranteed envelopes of system behavior that is problematic; stochastic algorithms, for example, would be fine, provided you could rigorously argue that their behavior is satisfactorily bounded.  Functional and non-functional correctness are both relevant here. (Dealing with non-functional performance and resource trade-offs forms an important component DAASE project, most typically as part of the collaborative work on adaptive automated software engineering.  Challenge. Demonstrate across a range of fundamental AI algorithms formal machine assisted proofs of correctness and scientifically justified predictive models of functionality-timing-memory-power-other tradeoffs.
  • 12.
    Principle people and computersare not the same Understand it, live with it, and then embrace it Or… Vive la (les) Difference(s)!
  • 13.
    Hard for humans Humans do some things well, and some things badly, and some things not at all.  Forget the doom and gloom. A lot of software works and works acceptably well. This is largely due to human efforts. We humans actually can do a fair job….But….  Ask a human to write an image classifier that distinguishes pictures containing cats from pictures containing dogs. Rather hard and standard specification and refinement techniques are not much use.  Furthermore, ask them to write such a binary classifier that works only 80% of the time. Che?  It’s just not the sort of thing we do well at all.  But why should we care?
  • 14.
    Vive la (les)Difference(s) Making n-version programming work!  Critical systems hardware: majority voting of a 2-out-of-3 architecture allows continued operation in the presence of a single hardware fault.  The software variant of this idea is more controversial It is questionable whether independence holds between developing teams and teams work from the same, possibly flawed, specification. Prog 1 Prog 1 Prog 1 Prog 1 Prog 2 Prog 3 Majority Vote on Output Majority Vote on Output Input goes to all processors Input goes to all processors
  • 15.
    Vive la (les)Difference(s) Making n-version programming work!  Automated programming may actually make the idea more palatable.  Programming teams may make the same assumptions and the same mistakes  But automated program discovery techniques can give sets of programs with complementary weaknesses.  Only need the majority to be correct on any training and subsequent example.  Ensemble based approaches…  Challenge. Demonstrate a credible approach to N-version automated programming that is scientifically grounded and capable of satisfying assurance requirements of appropriate regulatory authorities.  Suitable applications will have to be identified. This challenge has roots in both SE and AI.
  • 16.
    Vive la (les)similitude(s) Making computers like humans!  Humans are expensive and get bored very quickly.  Great need for human testers in many systems.  Long-standing quest. To create a system that cannot be distinguished from humans.  For most people the challenge was for humans to create such a system  Challenge. Bring AI to bear to create high performing proxies for humans for specific purposes?  What we really need is automated human compilation. (Humans are programs too by the way.)  But this should also allow for the “dumb user” – the one who does something that screws the system up in an unanticipated way.  Abstractly, this reduces to modeling of traces/sequences – what is the range of approaches for this?
  • 17.
    trustable real-time ai Finally, as part of our drive for engineering trusted systems, we envisage significant exchange of ideas on software testing.  Challenge. Draw on the testing expertise of SE and AI to develop a credible and scientific basis for the testing of complex systems.  We envisage further uses of AI of for stress testing of systems (but the scientific justification for such testing may be some way off).  In addition, enhanced fault based (e.g. mutation) testing will likely play an important part in testing systems of systems based on AI technology (e.g. agent based systems).
  • 18.
    The Challenge ofcoming to terms with resource abundance and resource constraints or The sky’s (cloud’s) the limit And Just how low can you go?
  • 19.
    Resource availability: AuntAda’s Dividend Challenge  Extraordinary computational power. The ‘cloud’ (however constituted) is very much the topic of the day.  One of our most computationally-minded august relatives, Aunt Ada, has now retired from programming and has invested deeply and successfully in cloud technology.  Each year she receives a dividend of 1 billion processor hours (with each processor clocking at approximately 1 GHz).  She is free to use them as she sees fit, but they have to be spent within one year. She wishes to support speculative research in SE and AI.  Challenge. Identify the problem from SE, from AI, or their fusion that provides the best use of such resources.
  • 20.
    The guilt trippower diet challenge  Ever-increasing power consumption is a serious concern: some highly developed societies now live in fear of power outages.  GUILT TRIPPING: We aim to encourage and celebrate power-frugality with a series of challenges.  Challenge. For your favourite application or algorithm from SE, from AI, or their fusion, demonstrate a year on year power consumption reduction of 20% for the next 20 years.  Challenge. (The 1 J Diet Challenge.) You are given 1 Joule of energy. Identify the most ambitious task that can be completed using no more than 1 J.  For the benefit of the less militantly frugal, we offer the corresponding 10 J and 100 J Challenges. The above is intended as a playful attempt to spark efforts in the area of low power functionality.  The challenge may morph from real power to a more abstract model of power, e.g. to a virtual mapping from instructions to power consumed.  Yes, you’ll have to make hardware assumptions/have some common computation al model.
  • 21.
    The dark side Principle:embracing the dark side of ai can be more fun and highly productive challenge: to fully realise the destructive capabilities of ai
  • 22.
    Thinking within thebox  Habitual to recommend THINK OUTSIDE THE BOX . In a sense this is nonsense. Aim is not to impose unnecessary constraints due to the mental baggage (assumptions, favourite techniques etc.). THINK WITHIN THE BOX but do so in a way that gives a result we actually find appealing. Self-constructed box The real box
  • 23.
    Thinking within thebox: stressing systems  Best is best but very personal. Lots of AI (and indeed OR) related research in the optimisation arena.  Provided we can engineer feedback we can optimise for what are usually regarded as NEGATIVE performance or other criteria – STRESS TEST THE SYSTEM.  Already see examples – e.g. growing worst case execution times (various)  But “stress” and “systems” are flexible beasts:  Search based software testing – grow tests data that breaks predicates (module pre-conditions, post conditions)  Push non-functional properties to their limits  Push predicates (a la Daikon etc.) to their limits by aggressive test data generation). Attacking products of AI with AI in order to strengthen them!
  • 24.
    The Challenge ofthe New Computing New Will Be Big
  • 25.
    Teleportation  Quantum InformationProcessing (QIP) and Quantum Computing (QC) offers us a radically different computational model and capability.  Star Trek is with us now (sort of)!. Here’s Brassard’s TELEPORTATION  Here’s an evolved one form Yabuki & Iba
  • 26.
    Grover’s Algorithm  QuantumInformation Processing (QIP) and Quantum Computing (QC) offers us a radically different computational model and capability.  Here is Grover’s Search (a fundamental building block of QC) – allows you to search a database of size 2N in order of 2N/2 .  Spector et al. evolved a GS circuit for two qubits
  • 27.
    Shor’s Quantum FourierTransform  Here a Quantum Fourier Transform (Shor’s fundamental building block) – this is what enables you to break factorization in polynomial time.
  • 28.
    Massey et alQuantum Fourier Transform Generation
  • 29.
    New Computing  Ifwe were being cruel we could say that the AI community (including me!) is capable of re-discovering what the physicists have already discovered.  But there are actually very few genuinely different quantum algorithms around. A real opportunity….  Challenge. Using AI techniques generate new quantum algorithms to solve problems of acknowledged importance.  Question: what does the new computing offer current software engineering in terms of verification capability?
  • 30.
    Dynamic Adaptive AutomatedSoftware Engineering  A major EPSRC “programme grant” (around £6.7m) over four sites: UCL (Mark Harman PI), Birmingham (Xin Yao), Stirling (Edmund Burke) and York (John Clark - or me).  Major focus on automation and adaptivity (off-line and on-line)  Will have to face many of the problems discussed at this workshop:  Squishiness (in many forms)!  Uncertainty and its resolution.  Pareto and other MO approaches and their applicability.  Ascertaining the limits of machine learning and what can be justified/reasoned about.
  • 31.
    Acknowledgements  Sponsors  DAASE:Dynamic Adaptive Automated Software Engineering. EPSRC grant EP/J017515  The Birth, Life and Death of Semantic Mutants EPSRC grant EP/G043604/1  Many thanks to: Simon M Poulding, Mark Harman, Edmund Burke and Xin Yao