Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Software Engineering Challenges in building AI-based complex systems


Published on

Development of AI-based systems goes far beyond using specific AI-algorithms. The development itself is becoming more complex since data and algorithms become dependent. This presentation lists some of new challenges that AI-developers meet.

Published in: Software
  • Login to see the comments

Software Engineering Challenges in building AI-based complex systems

  1. 1. Software Engineering Challenges in building AI-based complex systems NTNU Seminar: Software Engineering, AI, Ethics 2019-03-15 Ivica Crnkovic,
  2. 2. Ivica Crnkovic Chalmers University of Technology | Gothenburg University Professor in Software Engineering Director of ICT Area of Advance@Chalmers Director of Chalmers AI Research Centre (CHAIR) (Mats Nordlund ©)
  3. 3. Chalmers AI Research Centre (CHAIR) Ethics Sustainability Equality Innovation 3 COLLABORATION PROJECTS - with industrial partners CHALMERS PROJECTS Advancing CHAIR activities A new initiative from Chalmers 370 MSEK 2019-2028 2019-03-18 Chalmers University of Technology 3 AI FOUNDATION AI Theories and Algorithms AI ENABLERS AI-based Systems and Software APPLIED AI Transport Automation and IoT Life science and Health E.
  4. 4. Software Engineering forAI Ethics Sustainability Equality Innovation 4 COLLABORATION PROJECTS - with partners CHALMERS PROJECTS Advancing CHAIR activities 2019-03-18 Chalmers University of Technology 4 AI FOUNDATION AI ENABLERS APPLIED AI Software Engineering for AI
  5. 5. Systems are getting AI - Example: autonomous driving architecture (Mats Nordlund ©)
  6. 6. Challenge:AI complexity iceberg AI Machine Learning Data Science Statistics, analysis, visualisation System Characteristics Performance, Real-time System constraints Storage capacity, memory, computation Data engineering Storage, finding, processing accuracy, availability Computational Science Algorithms, simulation System Requirements Reliability, Security, Safety Software Engineering AI-based development, lifecycle, variations trustworthiness, ethical values Data Value Domain knowledge
  7. 7. 3/18/2019 Chalmers 7 • Trend – Overall use of AI in software applications and Software-intensive systems • Many promising results on the functional/feature level • Enormous expectations to build system that use AI in obtaining features • New questions – of non technical character appear • Which real problems can be solved using data-driven and AI-based approaches? • How to interpret data? • Who is owner of data? • What are the ethical aspects of using data, and allowing machine to decide? • New questions of technical nature • How to efficiently collect, store, process, analyse, and present data? • How to efficiently build the AI-based systems? • How to ensure dependability/trustworthy of such systems? • WHAT KIND OF SOFTWARE ENGINEERING SUPPORT IS NEEDED? Expectations – overall use of AI
  8. 8. 3/18/2019 Chalmers 8 Presentation based on 1 (literature) • Machine Learning: The High-Interest Credit Card of Technical Debt, D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young • Software Engineering Challenges of Deep Learning, Euromicro SEAA 2018, A. Arpteg, B. Brinne, l. Crnkovic-Friis, J. Bosch 2 (empirical research at Swedish companies) • A taxonomy of software engineering challenges for machine learning systems: An empirical investigation, XP2019, L. Lwakatare, A. Raj, J. Bosch, H. Holmström, I.Crnkovic • Ongoing work in “Holistic DevOps Approach” project Software Engineering Challenges in development of AI- based systems
  9. 9. 3/18/2019 Chalmers 9 System architecture of non-AI systems (example ) Platform Middleware Subsystem A HW Subsystem B Subsystem C C1 C2 C3 C4 C3 Subsystem and components Component-based and service-based approach - Components - Encapsulation of data - Encapsulation of functionality - Dependency between components defined and controlled
  10. 10. 3/18/2019 Chalmers 10 AI-System - system architecture (example) Platform Middleware Subsystem A HW Subsystem B Subsystem C C1 C2 C3 C4 C3 Subsystem and components Components – black boxes - Components - Encapsulation of AI-based functionality - Dependency between components defined and controlled - What about data?
  11. 11. 3/18/2019 11 AI- based components C1 Data – used for ML controlled Controlled? • The results depend not only on the algorithms and controlled data but also on uncontrolled/unknown data • The AI-based functions are not continuous: small change of data can cause big changes
  12. 12. 2019-03-18 Chalmers University of Technology 12 Data-related challenges 1. Entanglement (Data fusion) 2. Data dependencies • Unstable data dependencies • Underutilized Data Dependencies 3. Hidden Feedback Loops 4. Undeclared Consumers 5. Correction Cascades
  13. 13. 3/18/2019 Chalmers 13 • M= {f1, f2, …, fn} - system – a set of features f1…fn in an AI model Data-related challenges: 1. Entanglement (Data fusion)
  14. 14. The first version of AI system is easy to obtain, but making subsequent improvements is unexpectedly difficult. 3/18/2019 Chalmers 14 • M= {f1, f2, …, fn} - system – a set of features f1…fn in an AI model • Changing input data for fx, requires changes in values of fi – weights, importance, to get optimal result • Adding a new feature may require the same change that can cause unpredictable changes in the model Data-related challenges: 1. Entanglement (Data fusion) CACE principle: Changing Anything Changes Everything
  15. 15. 3/18/2019 Chalmers 15 • Code dependency – known as a technical debt (built-in problems) • Data dependencies – more complex Data-related challenges – 2. Data dependencies DATA DATA DATA AI-component AI-component
  16. 16. 3/18/2019 Chalmers 16 • Unstable data dependencies • Some data change over time (value, accuracy, precision) • A common mitigation strategy • Introduce versions of data sets • Implication • Version and configuration management challenges • Not only version and configuration management of functions (code) but also data • how to manage data dependency? No developed tools • Static analysis of systems with data dependencies - no tools Data-related challenges – 2. Data dependencies DATA DATA DATA AI-component AI-component
  17. 17. 3/18/2019 Chalmers 19 • The system can optimise for the feedback Data-related challenges – 3. Hidden Feedback Loops DATA DATA AI-component
  18. 18. 3/18/2019 Chalmers 20 • Changes of data can have unexpected consequences Data-related challenges – 4. Undeclared Consumers DATA DATA AI-component C1 AI-component C2 Problem if C1 changes and Change data DATA
  19. 19. 3/18/2019 Chalmers 21 • Model m for problem P • Model m’ for problem P’ (P´- P = DP) • Often used solution • m’(P’) = m(P) + Dm - by changes of data used for P break the relation between m’ and m Data-related challenges: 5. Correction Cascades DATA DATA AI-component AI-component DATA DATA AI-component AI-component Dependency of data should be controlled
  20. 20. 3/18/2019 Chalmers 22 • Heterogeneity of data (different formats, accuracy, semantics) and use of standard ML functions require a lot Glue code • 95% of code in AI-based systems is a glue-code (empirical data) • Requires • Frequent refactoring of code • Re-implementing AI models • Pipeline Jungles • ML-friendly format data become a jungle of scrapes, joins, and sampling steps, intermediate files • Requires – a close team work of data and domain engineers System-design anti-patterns (1)
  21. 21. 3/18/2019 Chalmers 23 Dead Experimental Code paths • AI solution requires a lot of experimentation • A lot of code that will not be used later • Problems • Dead code • Version management – how to preserve useful configuration branches, and remove unnecessary System-design anti-patterns (2)
  22. 22. 3/18/2019 Chalmers 24 Managing Changes in the External World System Cloud Data Local Data Local Data Local Data Historical data Used in ML Continuous change of data Challenge: models and system behaviour dependent of data Examples: - Threshold changes - Correlation between data Requirements: Continuous monitoring of data and system. Continuous test.
  23. 23. 3/18/2019 Chalmers 25 Life cycle challenges ML Code
  24. 24. 3/18/2019 Chalmers 26 The efforts in development cycle Amount of effort Amount of attention
  25. 25. 2019-03-18 Chalmers University of Technology 27 Software Engineering Challenges to develop AI-base systems • empirical studies at Swedish industry
  26. 26. Requirements driven development - Regulatory features - Competitor parity features - Commodity features Outcome/data driven development - Value hypothesis - New ”flow” features - Innovation AI driven development - Minimize prediction errors - Many points in data set - Combinatorial explosion of alternatives continuous deployment behavior data System in operation AI component SW component continuous deployment behavior data Holistic DevOps Framework Holistic DevOps Framework Investigate the state of art and of practice
  27. 27. 2019-03-18 Chalmers University of Technology 29 Study 1: Overview of studied industrial DL systems A. Project EST: Real Estate Valuation B. Project OIL: Predicting Oil and Gas Recovery Potential C. Project RET: Predicting User Retention D. Project WEA: Weather Forecasting E. Project CCF: Credit Card Fraud Detection F. Project BOT: Poker Bot Identification G. Project REC: Media Recommendations
  28. 28. 2019-03-18 Chalmers University of Technology 30 Study 1: Challenges 1. Development challenges 2. Production challenges 3. Organizational & project challenges
  29. 29. 3/18/2019 Chalmers 31 • Experiment Management: how to keep track of all the contextual factors when running many experiments • Hardware, Platform, Source code, Configuration, Data sources, Training state • Difficult to predict behaviour of ML • Limited Transparency: neural networks give little insight into how and why they work • Troubleshooting and Testing: large libraries, lazy execution and lack of tooling complicate solving defects enormously. • Glue Code and Supporting Systems • Resource Limitations • Memory, CPU power, Storage Study 1: Development challenges
  30. 30. 3/18/2019 Chalmers 32 • Dependency Management. • Requirements on powerful computational resources that are continuously changing • Monitoring and Logging • Unintended Feedback Loops • Safety, security, and privacy Study 1: Production challenges
  31. 31. 3/18/2019 Chalmers 33 • Effort Estimation • Difficult to know when the models will be sufficiently good • Cultural Differences • Software developers, data scientists • Development process • Continuous changes • Issues with the compatibilities in changes Study 1: Organizational & project challenges
  32. 32. 2019-03-18 Chalmers University of Technology 34 Study 1: Engineering challenges of DL ■Dependency Management ■Glue code and supporting systems ■Unintentional feedback loops ■Monitoring and logging ●Troubleshooting ●Testing ●Limited transparency ●Experimental management ■ Organisational challenges ● Development challenges ◆ Development challenges ◆Effort estimation ◆Cultural differences Hard to solve Easier to solveLower business impact High business impact ●Resource Limitations ◆Privacy and Safety/security
  33. 33. 2019-03-18 Chalmers University of Technology 35 Study 2: Overview of studied industrial ML systems System Use case of ML components Software for automated driving Interpreting sensor data to understand the contained information at a high level AI web platform Predicting quality of end product based on different measurements from machines, IoT devices Automating tagging of sentiments in online music library Annotation web platform Collaborative annotation of training data and predicting quality of annotations Mobile network operations Predicting failures at site to give insights into mobile network operations e.g., predicting degradation of key performance indicators (latency, throughput) and predicting site’s sustenance to power outage Sales engagement web platform Automating information extraction from out-of-office reply to optimize communication between sales reps and prospects Experimentation web platform Models of ML components of e.g., web search engine, are compared using important metrics through A/B tests
  34. 34. 2019-03-18 Chalmers University of Technology 36 Study 2: A taxonomy of evolution of use of ML in industry Experimentation and prototyping Non-critical deployment Critical deployment Cascading deployment Autonomous ML components
  35. 35. 2019-03-18 Chalmers University of Technology 37 Study 2: Challenges summary Difficulty in ML/DL pipeline implementation e.g., toolchain for scheduling training jobs, diagnosis and monitoring of ML models Difficulty in debugging models and subsequently testing DL systems e.g., due to unexplainable/limited transparency of NN Unestablished KPIs for monitoring the quality of deployed models Partial trust in the results of ML/DL systems e.g., unreliable results make it unsuitable/require other techniques for regulated products Extreme overconfidence in technology and Limited number of persons with overlap skills between SE and ML Difficulty in eliciting ML/DL use case and desired outcome (s) Inability to take at start end-to-end ML/DL use case due to large scale and complex environment Difficulty in getting access to training data due to e.g., regulation, ownership by client or other teams. Limited approaches for selecting and cleaning training data from large data collection e.g., to annotate in automotive Limited approaches for validating training data e.g., measures for accuracy and consistency of annotations, identifying missing values Data Devel. Trust
  36. 36. 2019-03-18 Chalmers University of Technology 38 Study 2: Challenges mapped to the taxonomy A taxonomy of software engineering challenges for machine learning systems: An empirical investigation, XP2019, Lucy Ellen Lwakatare, Aiswarya Raj, Jan Bosch, Helena Holmström Olsson and Ivica Crnkovic
  37. 37. 3/18/2019 Chalmers 39 • New lifecycle and development approach • New technologies, new communities • Gaps between prototypes and full lifecycle support Example of question: • How to manage continuous deployment in trustworthy and efficient way? • How easily can an entirely new algorithmic approach be tested at full scale? • How precisely can the impact of a new change to the system be measured? • Does improving one model or signal degrade others? • How quickly can new members of the team be brought up to speed? A lot of new challenges in (software) development Conclusion
  38. 38. 3/18/2019 Chalmers 40