Future Advanced Testing Technology Workshop
Testing Technology and Practice in the age of LLM
Tokyo, Japan, November 1st - 2nd, 2024
• General Chair
– Hironori Washizaki, Waseda University, Japan
• Committee
– Naoyasu Ubayashi, Waseda University, Japan
– Nobukazu Yoshioka, QAML Inc. / Waseda University, Japan
– Hironori Takeuchi, Musashi University, Japan
1
Workshop supporters
2
Opening: IEEE Software Testing Technology
Development Trend
Hironori Washizaki
Professor, Waseda University
IPSJ-SIGSE, Chair
IEEE Computer Society, 2025 President
http://www.washi.cs.waseda.ac.jp/
Future Advanced Testing Technology Workshop
Tokyo, Japan, November 1st - 2nd, 2024
IEEE CS Technology
Prediction Team (Chair:
Dejan Milojicic)
5
Megatrends in IEEE Future Direction 2023 and IEEE-CS Technology Predictions 2024
IEEE CS Technology Prediction Team (Chair: Dejan Milojicic) https://www.computer.org/resources/2024-top-technology-predictions
• AGI technologies are
deeply entangled
with socio,
economic, and
ecological aspects.
Next
Gen AI
Generative AI
applications
Metavers
e
Low power AI
accelerator
Tech Predictions: Next Generation AI
• General Intelligence: Current AI systems are
specialized and narrow. Evolving towards AGI
requires an interdisciplinary collaboration
across computer science, engineering, ethics
and even philosophy.
• Trust and explainability: The black-box nature
of AI can cause a reduction in interpersonal
trust. We need technology that prevents
deriving secrets from large language models,
and takes into account ethical consideration
and data privacy.
• AI sustainability: As AI models keep growing,
the excessive data center loading causes
concern on environmental impact. Increased
model efficiency, improved accuracy and
greater flexibility are key.
• Human-centered AI: Next gen AI should focus
on enhancing human capabilities, for example
by increasing the empathy level.
IEEE CS Technology Prediction Team (Chair: Dejan Milojicic) https://www.computer.org/resources/2024-top-technology-predictions
• Enhanced creativity in arts and design,
accelerated design. processes and
collaborative human-AI creative processes.
• Generative AI-based revolutionized
personalized medicine, from drug discovery
to tailored treatment plans.
• Personalized education and marketing boost
productivity.
• Improved customer support through natural
interactions conversation, problem solving,
detailed product knowledge.
• Accelerated scientific discovery and 3D
modeling.
Problems/demand Opportunities
HCAI, TAI and XAI
7
[Chamola+23] V. Chamola, et al., A Review of Trustworthy and Explainable Artificial Intelligence (XAI), IEEE Access, 11, 2023
• Responsible AI (RAI)
development, bringing to
the table issues including
but not limited to
fairness, explainability,
and privacy in AI, and
centering AI around
humans [Tahael+23]
Trustworthy AI (TAI) aspects [Chamola+23]
Explainable AI (XAI) approaches [EChamola+23]
[Tahael+23] M. Tahael, et al.., A Systematic Literature Review of Human-Centered, Ethical, and Responsible AI, arXiv:2302.05284v3, 2023
Human-centered AI
(HCAI) [Tahael+23]
[Chamola+23] V. Chamola, et al., A Review of Trustworthy and Explainable Artificial Intelligence (XAI), IEEE Access, 11, 2023
[Noyori+23] Y. Noyori, H. Washizaki, et al., “Deep learning and gradient-based extraction of bug report features related to bug fixing time,” Frontiers in Computer Science, 5, 2023
XAI approaches and more
• XAI approaches in autonomous systems
[Chamola+23]
– Explanation and interpretations of DL models
– Observation-to-action guideline
– Causal inferences in/out of interpretations
– Goal-based forecasting
– Purpose/objective identification
• Technical and social reliability of
explanations
– E.g., fake explanation by surrogate models
and examples
• Counterfactual explanations
– What should be different in the input
instance to change the outcome [Guidotti24]
– Needs to consider constraint of ensuring the
existence of reasonable actions for as many
instances as possible [Kanamori+24]
8
[Guidotti24] Riccardo Guidotti, Counterfactual explanations and how to find them: literature review and benchmarking, Data Mining and Knowledge Discovery, 38, 2024
[Kanamori+24] K. Kanamori, et al., Learning Decision Trees and Forests with Algorithmic Recourse, ICML 2024
using 19990914 build on
win98 … with the same server
the problem was with … to figure
out what the actual bug …
Short bug-fixing-time Long bug-fixing-time
[Noyori+23]
Knowledge Area
Topic Topic
Reference
Material
Body of Knowledge Skills Competencies Jobs / Roles
SWEBOK
Software Engineering Professional Certifications
SWECOM
EITBOK
Learning courses
9
Guide to the Software Engineering Body of Knowledge (SWEBOK) [Washizaki24]
https://www.computer.org/education/bodies-of-knowledge/software-engineering
• Guiding researchers and practitioners to identify and have
common understanding on “generally-accepted-knowledge”
in software engineering
• Foundations for certifications and educational curriculum
• ‘01 v1, ‘04 v2, ‘05 ISO adoption, ‘14 v3, ’24 v4 just published!
[Washizaki24] H. Washizaki, eds., Guide to the Software Engineering Body of Knowledge (SWEBOK Guide), Version 4.0, IEEE Computer Society, 2024
SWEBOK Evolution from V3 to V4
• Modern engineering, practice update, BOK grows and recently developed areas
Requirements
Design
Construction
Testing
Maintenance
Configuration Management
Engineering Management
Process
Models and Methods
Quality
Professional Practice
Economics
Computing Foundations
Mathematical Foundations
Engineering Foundations
Requirements
Architecture
Design
Construction
Testing
Operations
Maintenance
Configuration Management
Engineering Management
Process
Models and Methods
Quality
Security
Professional Practice
Economics
Computing Foundations
Mathematical Foundations
Engineering Foundations
V3 V4
Agile,
DevOps
AI and
SE
Metamodel
ML
evaluation
Visualizing issues
ML
evaluation
Visualizing resolution
OK
OK OK
Failed Failed
OK OK
OK
O
K
OK OK OK
[ML.VP1🡨
AI.VP1]
Providereliable
real-timeobject
detectionsystem
fordriving
decisionmakingin
highway(incl.
trafficsign
detectionand
lane/vehicle
detection)
• [ML.DS1]Procured
datasets
• [ML.DS2]Internal
databasefrom
collectionduring
operation
• [ML.DC1]Openand
commercialdatasets
• [ML.DC2]Data
collectedduring
operation(imageand
identificationresult)
•[ML.F1🡨
AI.D1/AI.D3]
Boundingbox
forobject(incl.
othervehicles
orsigns)
•[ML.F2🡨
AI.D2]Ridge
detectionfor
lanedetection
[ML.BM1]
Modelswillbe
developed,
tested,and
deployedtocars
monthly
• [ML.PT1]Input:
imagefromsensors
• [ML.PT2←AI.D]
Output:trafficsigns,
lanemarking,
vehicles,and
pedestrians.
[ML.De1]Use
predictionresults
fordecision-
makinginself-
drivingsystem
[ML.IS1]
Usingtestdata,
achieveveryhigh
recallandhigh
precisionin
followingcondition:
night,rainy,and
generalcondition
Datasetsissplitinto
80:20ratio
[ML.MP1]
Predictionshould
bemadein
batchesreal
time.
[ML.M1]Inputdatamonitoring
[ML.VP1🡨
AI.VP1]
Providereliable
real-timeobject
detectionsystem
fordriving
decisionmakingin
highway(incl.
trafficsign
detectionand
lane/vehicle
detection)
• [ML.DS1]Procured
datasets
• [ML.DS2]Internal
databasefrom
collectionduring
operation
• [ML.DC1]Openand
commercialdatasets
• [ML.DC2]Data
collectedduring
operation(imageand
identificationresult)
•[ML.F1🡨
AI.D1/AI.D3]
Boundingbox
forobject(incl.
othervehicles
orsigns)
•[ML.F2🡨
AI.D2]Ridge
detectionfor
lanedetection
[ML.BM1]
Modelswillbe
developed,
tested,and
deployedtocars
monthly
• [ML.PT1]Input:
imagefromsensors
• [ML.PT2←AI.D]
Output:trafficsigns,
lanemarking,
vehicles,and
pedestrians.
[ML.De1]Use
predictionresults
fordecision-
makinginself-
drivingsystem
[ML.IS1]
Usingtestdata,
achieveveryhigh
recallandhigh
precisionin
followingcondition:
night,rainy,and
generalcondition
Datasetsissplitinto
80:20ratio
[ML.MP1]
Predictionshould
bemadein
batchesreal
time.
[ML.M1]Inputdatamonitoring
[ML.VP1🡨
AI.VP1]
Providereliable
real-timeobject
detectionsystem
fordriving
decisionmakingin
highway(incl.
trafficsign
detectionand
lane/vehicle
detection)
•[ML.DS1]Procured
datasets
•[ML.DS2]Internal
databasefrom
collectionduring
operation
•[ML.DC1]Openand
commercialdatasets
•[ML.DC2]Data
collectedduring
operation(imageand
identificationresult)
•[ML.F1🡨
AI.D1/AI.D3]
Boundingbox
forobject(incl.
othervehicles
orsigns)
•[ML.F2🡨
AI.D2]Ridge
detectionfor
lanedetection
[ML.BM1]
Modelswillbe
developed,
tested,and
deployedtocars
monthly
•[ML.PT1]Input:
imagefromsensors
•[ML.PT2←AI.D]
Output:trafficsigns,
lanemarking,
vehicles,and
pedestrians.
[ML.De1]Use
predictionresults
fordecision-
makinginself-
drivingsystem
[ML.IS1]
Usingtestdata,
achieveveryhigh
recallandhigh
precisionin
followingcondition:
night,rainy,and
generalcondition
Datasetsissplitinto
80:20ratio
[ML.MP1]
Predictionshould
bemadein
batchesreal
time.
[ML.M1]Inputdatamonitoring
Adding repair-strategy
ML training
ML repair
SE4AI: System modeling and MLOps integration [Takeuchi+24][Husen+24]
12
[Husen+24] J. H. Husen, H. Washizaki, et al., Integrated Multi-view Modeling for Reliable Machine Learning-Intensive Software Engineering, Software Quality Journal, 32, 2024
[Takeuchi+24] H. Takeuchi, et al., Enterprise Architecture-based Metamodel for Machine Learning Projects and its Management, Future Generation Computer Systems, 2024
Requirements
Construction
Design
Test
Architecture
Operations
Economics
Models and Methods
Quality
Requirements
analysis and design
SWEBOK: AI for SE, SE for AI
• Software engineering and AI are mutually related to
each other in basically two ways: Software
engineering for AI systems (i.e., SE for AI), and AI
applications in software engineering (i.e., AI for SE)
• SE for AI
– There is a need for particular support of SE for AI,
– such as interdisciplinary collaborative teams of data
scientists and software engineers, software evolution
focusing on large and changing datasets, and ethics and
equity requirements engineering.
– E.g., ML testing
• AI for SE
– Aims to establish efficient ways of building high-quality
software systems by replicating human developers’
behavior.
– E.g., AI/ML for software testing 13
Software
engineering
AI
AI for SE
SE for AI
ML Testing: Approaches and challenges in industry [Rahman+23]
• ML model implementation testing
– Performance-based testing: Sanity
checks, performance against
benchmark/baseline, cross-
model/algorithm/language/platform
testing
– Visualization
– Traditional unit testing and debugging
– Domain knowledge-based validation
• ML code defect detection
– Performance based symptoms
– Training behaviors
– Model output
• Challenges of testing ML applications
– Black-box nature of ML models
– Model’s robustness to errors
– Data quality
– Volatile performance
– Domain expertise
– Cost
– Lack of concrete methodology
– Interpretability, explainability
• Challenges of post-deployment testing
– Test data
– Performance
– Resource requirements
– System complexity
– Platform diversity
– Adaptability
– User satisfaction
14
[Rahman+23] M.S. Rahman, F. Khomh, A. Hamidi, J. Cheng, G. Antoniol, H. Washizaki, “Machine Learning Application Development: Practitioners’ Insights,”
Software Quality Journal, 31, 2023.
ML Testing: Topics and approaches
• Bugs & debugging
• Explanations
• Quality testing
• Test architecture & languages
• Test case: Generation &
selection
• Testing metrics
• Testing methods [Sherin+19][Wan+24]
– Metamorphic testing
– Coverage based testing
– Adversarial testing
– Mutation testing
– Symbolic & concolic testing
– Multi-implementation testing
– Evolutionary computing
– Search-based testing
– Fuzzing
– Combinatorial testing
15
[Fernandez+22] S.M. Fernández, et al., Software Engineering for AI-Based Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31(2), 2022
[Wan+24] X. Wan, et al., Coverage-guided fuzzing for deep reinforcement learning systems, JSS, 210, 2024
[Sherin+19] S. Sherin, et al., A Systematic Mapping Study on Testing of Machine Learning Programs, arXiv:1907.09427, 2019
[Fernandez+22]
ML Testing: Taxonomy
16
[Song+22] Q. Song, et al., Exploring ML testing in
practice – Lessons learned from an interactive rapid
review with Axis Communications, CAIN 2022
[Zhang+22] J.M. Zhang, et al., Machine
Learning Testing: Survey, Landscapes and
Horizons, TSE, 48(1), 2022
AI/ML for Testing: Topics [Yang+22]
• Bug-related detection
• Bug localization
• Vulnerability detection
• Testing techniques
• Test case generation
• Program analysis
17
[Yang+22] Y. Yang, et al., A Survey on Deep Learning for Software Engineering, ACM Computing Surveys, 54(10), 2022
Generative AI for Effective Software Development (Springer, 2024)
• Fundamentals of Generative AI
– An Overview on Large Language Models
• Patterns and Tools for the Adoption of
Generative AI in Software Engineering
– Comparing Proficiency of ChatGPT and Bard in
Software Development
– DAnTE: A Taxonomy for the Automation
Degree of Software Engineering Tasks
– ChatGPT Prompt Patterns for Improving Code
Quality, Refactoring, Requirements Elicitation
and Software Design
– Requirements Engineering Using Generative
AI: Prompts and Prompting Patterns
– Advancing Requirements Engineering Through
Generative AI: Assessing the Role of LLMs
• Generative AI in Software Development: Case
Studies
– Generative AI for Software Development: A
Family of Studies on Code Generation
– BERTVRepair: On the Adoption of CodeBERT
for Automated Vulnerability Code Repair
– ChatGPT as a Full-Stack Web Developer
• Generative AI in Software Engineering
Processes
– Transforming Software Development with
Generative AI: Empirical Insights on
Collaboration and Workflow
– How Can Generative AI Enhance Software
Management? Is It Better Done than Perfect?
– Value-Based Adoption of ChatGPT in Agile
Software Development: A Survey Study of
Nordic Software Experts
– Early Results from a Study of GenAI Adoption
in a Large Brazilian Company: The Case of
Globo
• Future Directions and Education
– Generating Explanations for AI-Powered Delay
Prediction in Software Projects
– Classifying User Intent for Effective Prompt
Engineering: A Case of a Chatbot for Startup
Teams
– Toward Guiding Students: Exploring Effective
Approaches for Utilizing AI Tools in
Programming Courses 18
19
[Wang+24] J. Wang, et al., Software Testing with Large Language Models: Survey,
Landscape, and Vision, TSE, 2024
AI/ML for Testing: LLM for testing [Wang+24]
What we need to discuss?
• Reliable AI and AI for reliable systems with evaluation
– Multimodal LLMs for Digital Twins (Abdulmotaleb El Saddik)
– Reliable Machine Learning from Imperfect Information: Recent Advances and Future
Challenges (Masashi Sugiyama)
– Performance and Safety Evaluation for N-version Perception Systems (Fumio Machida)
• AI testing: Quality Assurance and Trustworthiness of AI-based Systems
– Quality Assurance of AI-Based Systems, Hiroshi Maruyama
– Uncertainty-Assisted Testing and Trustworthy Decision Making with Machine Learning
(Shervin Shirmohammadi)
– Testing AI: Navigating Complex Challenges, Approaches, and Future Pathways (San
Murugesan)
– Metrics-based Repair/Break Prediction for Effective DNN Testing (Naoyasu Ubayashi)
– Towards Reliable AI-Enabled Cyber-Physical Systems: Testing, Debugging, and Beyond (Jianjun
Zhao)
– Towards Trustworthy Assurance of AI Systems in the LLM Era (Lei Ma)
– AI quality management and issues in Hitachi DX Engineering Research (Daisuke Shimbara)
• AI for testing: AI and Foundation Model-empowered Testing
– Leveraging LLMs in Software Testing: Current State-of-the-Art and Future Directions (Hussein
Al Osman)
– Automated Test Specification Updates using LLM (Shogo Tokui)
– Testing and Repairing Technique for AI and by AI in Automated Driving Systems: Before and
After LLM (Fuyuki Ishikawa)
– AI Foundation Models for Cyber-Physical Systems Testing (Shaukat Ali)
20

IEEE Software Testing Technology Development Trend

  • 1.
    Future Advanced TestingTechnology Workshop Testing Technology and Practice in the age of LLM Tokyo, Japan, November 1st - 2nd, 2024 • General Chair – Hironori Washizaki, Waseda University, Japan • Committee – Naoyasu Ubayashi, Waseda University, Japan – Nobukazu Yoshioka, QAML Inc. / Waseda University, Japan – Hironori Takeuchi, Musashi University, Japan 1
  • 2.
  • 3.
    Opening: IEEE SoftwareTesting Technology Development Trend Hironori Washizaki Professor, Waseda University IPSJ-SIGSE, Chair IEEE Computer Society, 2025 President http://www.washi.cs.waseda.ac.jp/ Future Advanced Testing Technology Workshop Tokyo, Japan, November 1st - 2nd, 2024
  • 4.
    IEEE CS Technology PredictionTeam (Chair: Dejan Milojicic)
  • 5.
    5 Megatrends in IEEEFuture Direction 2023 and IEEE-CS Technology Predictions 2024 IEEE CS Technology Prediction Team (Chair: Dejan Milojicic) https://www.computer.org/resources/2024-top-technology-predictions • AGI technologies are deeply entangled with socio, economic, and ecological aspects. Next Gen AI Generative AI applications Metavers e Low power AI accelerator
  • 6.
    Tech Predictions: NextGeneration AI • General Intelligence: Current AI systems are specialized and narrow. Evolving towards AGI requires an interdisciplinary collaboration across computer science, engineering, ethics and even philosophy. • Trust and explainability: The black-box nature of AI can cause a reduction in interpersonal trust. We need technology that prevents deriving secrets from large language models, and takes into account ethical consideration and data privacy. • AI sustainability: As AI models keep growing, the excessive data center loading causes concern on environmental impact. Increased model efficiency, improved accuracy and greater flexibility are key. • Human-centered AI: Next gen AI should focus on enhancing human capabilities, for example by increasing the empathy level. IEEE CS Technology Prediction Team (Chair: Dejan Milojicic) https://www.computer.org/resources/2024-top-technology-predictions • Enhanced creativity in arts and design, accelerated design. processes and collaborative human-AI creative processes. • Generative AI-based revolutionized personalized medicine, from drug discovery to tailored treatment plans. • Personalized education and marketing boost productivity. • Improved customer support through natural interactions conversation, problem solving, detailed product knowledge. • Accelerated scientific discovery and 3D modeling. Problems/demand Opportunities
  • 7.
    HCAI, TAI andXAI 7 [Chamola+23] V. Chamola, et al., A Review of Trustworthy and Explainable Artificial Intelligence (XAI), IEEE Access, 11, 2023 • Responsible AI (RAI) development, bringing to the table issues including but not limited to fairness, explainability, and privacy in AI, and centering AI around humans [Tahael+23] Trustworthy AI (TAI) aspects [Chamola+23] Explainable AI (XAI) approaches [EChamola+23] [Tahael+23] M. Tahael, et al.., A Systematic Literature Review of Human-Centered, Ethical, and Responsible AI, arXiv:2302.05284v3, 2023 Human-centered AI (HCAI) [Tahael+23]
  • 8.
    [Chamola+23] V. Chamola,et al., A Review of Trustworthy and Explainable Artificial Intelligence (XAI), IEEE Access, 11, 2023 [Noyori+23] Y. Noyori, H. Washizaki, et al., “Deep learning and gradient-based extraction of bug report features related to bug fixing time,” Frontiers in Computer Science, 5, 2023 XAI approaches and more • XAI approaches in autonomous systems [Chamola+23] – Explanation and interpretations of DL models – Observation-to-action guideline – Causal inferences in/out of interpretations – Goal-based forecasting – Purpose/objective identification • Technical and social reliability of explanations – E.g., fake explanation by surrogate models and examples • Counterfactual explanations – What should be different in the input instance to change the outcome [Guidotti24] – Needs to consider constraint of ensuring the existence of reasonable actions for as many instances as possible [Kanamori+24] 8 [Guidotti24] Riccardo Guidotti, Counterfactual explanations and how to find them: literature review and benchmarking, Data Mining and Knowledge Discovery, 38, 2024 [Kanamori+24] K. Kanamori, et al., Learning Decision Trees and Forests with Algorithmic Recourse, ICML 2024 using 19990914 build on win98 … with the same server the problem was with … to figure out what the actual bug … Short bug-fixing-time Long bug-fixing-time [Noyori+23]
  • 9.
    Knowledge Area Topic Topic Reference Material Bodyof Knowledge Skills Competencies Jobs / Roles SWEBOK Software Engineering Professional Certifications SWECOM EITBOK Learning courses 9 Guide to the Software Engineering Body of Knowledge (SWEBOK) [Washizaki24] https://www.computer.org/education/bodies-of-knowledge/software-engineering • Guiding researchers and practitioners to identify and have common understanding on “generally-accepted-knowledge” in software engineering • Foundations for certifications and educational curriculum • ‘01 v1, ‘04 v2, ‘05 ISO adoption, ‘14 v3, ’24 v4 just published! [Washizaki24] H. Washizaki, eds., Guide to the Software Engineering Body of Knowledge (SWEBOK Guide), Version 4.0, IEEE Computer Society, 2024
  • 10.
    SWEBOK Evolution fromV3 to V4 • Modern engineering, practice update, BOK grows and recently developed areas Requirements Design Construction Testing Maintenance Configuration Management Engineering Management Process Models and Methods Quality Professional Practice Economics Computing Foundations Mathematical Foundations Engineering Foundations Requirements Architecture Design Construction Testing Operations Maintenance Configuration Management Engineering Management Process Models and Methods Quality Security Professional Practice Economics Computing Foundations Mathematical Foundations Engineering Foundations V3 V4 Agile, DevOps AI and SE
  • 12.
    Metamodel ML evaluation Visualizing issues ML evaluation Visualizing resolution OK OKOK Failed Failed OK OK OK O K OK OK OK [ML.VP1🡨 AI.VP1] Providereliable real-timeobject detectionsystem fordriving decisionmakingin highway(incl. trafficsign detectionand lane/vehicle detection) • [ML.DS1]Procured datasets • [ML.DS2]Internal databasefrom collectionduring operation • [ML.DC1]Openand commercialdatasets • [ML.DC2]Data collectedduring operation(imageand identificationresult) •[ML.F1🡨 AI.D1/AI.D3] Boundingbox forobject(incl. othervehicles orsigns) •[ML.F2🡨 AI.D2]Ridge detectionfor lanedetection [ML.BM1] Modelswillbe developed, tested,and deployedtocars monthly • [ML.PT1]Input: imagefromsensors • [ML.PT2←AI.D] Output:trafficsigns, lanemarking, vehicles,and pedestrians. [ML.De1]Use predictionresults fordecision- makinginself- drivingsystem [ML.IS1] Usingtestdata, achieveveryhigh recallandhigh precisionin followingcondition: night,rainy,and generalcondition Datasetsissplitinto 80:20ratio [ML.MP1] Predictionshould bemadein batchesreal time. [ML.M1]Inputdatamonitoring [ML.VP1🡨 AI.VP1] Providereliable real-timeobject detectionsystem fordriving decisionmakingin highway(incl. trafficsign detectionand lane/vehicle detection) • [ML.DS1]Procured datasets • [ML.DS2]Internal databasefrom collectionduring operation • [ML.DC1]Openand commercialdatasets • [ML.DC2]Data collectedduring operation(imageand identificationresult) •[ML.F1🡨 AI.D1/AI.D3] Boundingbox forobject(incl. othervehicles orsigns) •[ML.F2🡨 AI.D2]Ridge detectionfor lanedetection [ML.BM1] Modelswillbe developed, tested,and deployedtocars monthly • [ML.PT1]Input: imagefromsensors • [ML.PT2←AI.D] Output:trafficsigns, lanemarking, vehicles,and pedestrians. [ML.De1]Use predictionresults fordecision- makinginself- drivingsystem [ML.IS1] Usingtestdata, achieveveryhigh recallandhigh precisionin followingcondition: night,rainy,and generalcondition Datasetsissplitinto 80:20ratio [ML.MP1] Predictionshould bemadein batchesreal time. [ML.M1]Inputdatamonitoring [ML.VP1🡨 AI.VP1] Providereliable real-timeobject detectionsystem fordriving decisionmakingin highway(incl. trafficsign detectionand lane/vehicle detection) •[ML.DS1]Procured datasets •[ML.DS2]Internal databasefrom collectionduring operation •[ML.DC1]Openand commercialdatasets •[ML.DC2]Data collectedduring operation(imageand identificationresult) •[ML.F1🡨 AI.D1/AI.D3] Boundingbox forobject(incl. othervehicles orsigns) •[ML.F2🡨 AI.D2]Ridge detectionfor lanedetection [ML.BM1] Modelswillbe developed, tested,and deployedtocars monthly •[ML.PT1]Input: imagefromsensors •[ML.PT2←AI.D] Output:trafficsigns, lanemarking, vehicles,and pedestrians. [ML.De1]Use predictionresults fordecision- makinginself- drivingsystem [ML.IS1] Usingtestdata, achieveveryhigh recallandhigh precisionin followingcondition: night,rainy,and generalcondition Datasetsissplitinto 80:20ratio [ML.MP1] Predictionshould bemadein batchesreal time. [ML.M1]Inputdatamonitoring Adding repair-strategy ML training ML repair SE4AI: System modeling and MLOps integration [Takeuchi+24][Husen+24] 12 [Husen+24] J. H. Husen, H. Washizaki, et al., Integrated Multi-view Modeling for Reliable Machine Learning-Intensive Software Engineering, Software Quality Journal, 32, 2024 [Takeuchi+24] H. Takeuchi, et al., Enterprise Architecture-based Metamodel for Machine Learning Projects and its Management, Future Generation Computer Systems, 2024 Requirements Construction Design Test Architecture Operations Economics Models and Methods Quality Requirements analysis and design
  • 13.
    SWEBOK: AI forSE, SE for AI • Software engineering and AI are mutually related to each other in basically two ways: Software engineering for AI systems (i.e., SE for AI), and AI applications in software engineering (i.e., AI for SE) • SE for AI – There is a need for particular support of SE for AI, – such as interdisciplinary collaborative teams of data scientists and software engineers, software evolution focusing on large and changing datasets, and ethics and equity requirements engineering. – E.g., ML testing • AI for SE – Aims to establish efficient ways of building high-quality software systems by replicating human developers’ behavior. – E.g., AI/ML for software testing 13 Software engineering AI AI for SE SE for AI
  • 14.
    ML Testing: Approachesand challenges in industry [Rahman+23] • ML model implementation testing – Performance-based testing: Sanity checks, performance against benchmark/baseline, cross- model/algorithm/language/platform testing – Visualization – Traditional unit testing and debugging – Domain knowledge-based validation • ML code defect detection – Performance based symptoms – Training behaviors – Model output • Challenges of testing ML applications – Black-box nature of ML models – Model’s robustness to errors – Data quality – Volatile performance – Domain expertise – Cost – Lack of concrete methodology – Interpretability, explainability • Challenges of post-deployment testing – Test data – Performance – Resource requirements – System complexity – Platform diversity – Adaptability – User satisfaction 14 [Rahman+23] M.S. Rahman, F. Khomh, A. Hamidi, J. Cheng, G. Antoniol, H. Washizaki, “Machine Learning Application Development: Practitioners’ Insights,” Software Quality Journal, 31, 2023.
  • 15.
    ML Testing: Topicsand approaches • Bugs & debugging • Explanations • Quality testing • Test architecture & languages • Test case: Generation & selection • Testing metrics • Testing methods [Sherin+19][Wan+24] – Metamorphic testing – Coverage based testing – Adversarial testing – Mutation testing – Symbolic & concolic testing – Multi-implementation testing – Evolutionary computing – Search-based testing – Fuzzing – Combinatorial testing 15 [Fernandez+22] S.M. Fernández, et al., Software Engineering for AI-Based Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31(2), 2022 [Wan+24] X. Wan, et al., Coverage-guided fuzzing for deep reinforcement learning systems, JSS, 210, 2024 [Sherin+19] S. Sherin, et al., A Systematic Mapping Study on Testing of Machine Learning Programs, arXiv:1907.09427, 2019 [Fernandez+22]
  • 16.
    ML Testing: Taxonomy 16 [Song+22]Q. Song, et al., Exploring ML testing in practice – Lessons learned from an interactive rapid review with Axis Communications, CAIN 2022 [Zhang+22] J.M. Zhang, et al., Machine Learning Testing: Survey, Landscapes and Horizons, TSE, 48(1), 2022
  • 17.
    AI/ML for Testing:Topics [Yang+22] • Bug-related detection • Bug localization • Vulnerability detection • Testing techniques • Test case generation • Program analysis 17 [Yang+22] Y. Yang, et al., A Survey on Deep Learning for Software Engineering, ACM Computing Surveys, 54(10), 2022
  • 18.
    Generative AI forEffective Software Development (Springer, 2024) • Fundamentals of Generative AI – An Overview on Large Language Models • Patterns and Tools for the Adoption of Generative AI in Software Engineering – Comparing Proficiency of ChatGPT and Bard in Software Development – DAnTE: A Taxonomy for the Automation Degree of Software Engineering Tasks – ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation and Software Design – Requirements Engineering Using Generative AI: Prompts and Prompting Patterns – Advancing Requirements Engineering Through Generative AI: Assessing the Role of LLMs • Generative AI in Software Development: Case Studies – Generative AI for Software Development: A Family of Studies on Code Generation – BERTVRepair: On the Adoption of CodeBERT for Automated Vulnerability Code Repair – ChatGPT as a Full-Stack Web Developer • Generative AI in Software Engineering Processes – Transforming Software Development with Generative AI: Empirical Insights on Collaboration and Workflow – How Can Generative AI Enhance Software Management? Is It Better Done than Perfect? – Value-Based Adoption of ChatGPT in Agile Software Development: A Survey Study of Nordic Software Experts – Early Results from a Study of GenAI Adoption in a Large Brazilian Company: The Case of Globo • Future Directions and Education – Generating Explanations for AI-Powered Delay Prediction in Software Projects – Classifying User Intent for Effective Prompt Engineering: A Case of a Chatbot for Startup Teams – Toward Guiding Students: Exploring Effective Approaches for Utilizing AI Tools in Programming Courses 18
  • 19.
    19 [Wang+24] J. Wang,et al., Software Testing with Large Language Models: Survey, Landscape, and Vision, TSE, 2024 AI/ML for Testing: LLM for testing [Wang+24]
  • 20.
    What we needto discuss? • Reliable AI and AI for reliable systems with evaluation – Multimodal LLMs for Digital Twins (Abdulmotaleb El Saddik) – Reliable Machine Learning from Imperfect Information: Recent Advances and Future Challenges (Masashi Sugiyama) – Performance and Safety Evaluation for N-version Perception Systems (Fumio Machida) • AI testing: Quality Assurance and Trustworthiness of AI-based Systems – Quality Assurance of AI-Based Systems, Hiroshi Maruyama – Uncertainty-Assisted Testing and Trustworthy Decision Making with Machine Learning (Shervin Shirmohammadi) – Testing AI: Navigating Complex Challenges, Approaches, and Future Pathways (San Murugesan) – Metrics-based Repair/Break Prediction for Effective DNN Testing (Naoyasu Ubayashi) – Towards Reliable AI-Enabled Cyber-Physical Systems: Testing, Debugging, and Beyond (Jianjun Zhao) – Towards Trustworthy Assurance of AI Systems in the LLM Era (Lei Ma) – AI quality management and issues in Hitachi DX Engineering Research (Daisuke Shimbara) • AI for testing: AI and Foundation Model-empowered Testing – Leveraging LLMs in Software Testing: Current State-of-the-Art and Future Directions (Hussein Al Osman) – Automated Test Specification Updates using LLM (Shogo Tokui) – Testing and Repairing Technique for AI and by AI in Automated Driving Systems: Before and After LLM (Fuyuki Ishikawa) – AI Foundation Models for Cyber-Physical Systems Testing (Shaukat Ali) 20