IEEE Software Testing Technology Development Trend

Future Advanced Testing Technology Workshop
Testing Technology and Practice in the age of LLM
Tokyo, Japan, November 1st - 2nd, 2024
• General Chair
– Hironori Washizaki, Waseda University, Japan
• Committee
– Naoyasu Ubayashi, Waseda University, Japan
– Nobukazu Yoshioka, QAML Inc. / Waseda University, Japan
– Hironori Takeuchi, Musashi University, Japan
1

Opening: IEEE Software Testing Technology
Development Trend
Hironori Washizaki
Professor, Waseda University
IPSJ-SIGSE, Chair
IEEE Computer Society, 2025 President
http://www.washi.cs.waseda.ac.jp/
Future Advanced Testing Technology Workshop
Tokyo, Japan, November 1st - 2nd, 2024

IEEE CS Technology
Prediction Team (Chair:
Dejan Milojicic)

5
Megatrends in IEEE Future Direction 2023 and IEEE-CS Technology Predictions 2024
IEEE CS Technology Prediction Team (Chair: Dejan Milojicic) https://www.computer.org/resources/2024-top-technology-predictions
• AGI technologies are
deeply entangled
with socio,
economic, and
ecological aspects.
Next
Gen AI
Generative AI
applications
Metavers
e
Low power AI
accelerator

Tech Predictions: Next Generation AI
• General Intelligence: Current AI systems are
specialized and narrow. Evolving towards AGI
requires an interdisciplinary collaboration
across computer science, engineering, ethics
and even philosophy.
• Trust and explainability: The black-box nature
of AI can cause a reduction in interpersonal
trust. We need technology that prevents
deriving secrets from large language models,
and takes into account ethical consideration
and data privacy.
• AI sustainability: As AI models keep growing,
the excessive data center loading causes
concern on environmental impact. Increased
model efficiency, improved accuracy and
greater flexibility are key.
• Human-centered AI: Next gen AI should focus
on enhancing human capabilities, for example
by increasing the empathy level.
IEEE CS Technology Prediction Team (Chair: Dejan Milojicic) https://www.computer.org/resources/2024-top-technology-predictions
• Enhanced creativity in arts and design,
accelerated design. processes and
collaborative human-AI creative processes.
• Generative AI-based revolutionized
personalized medicine, from drug discovery
to tailored treatment plans.
• Personalized education and marketing boost
productivity.
• Improved customer support through natural
interactions conversation, problem solving,
detailed product knowledge.
• Accelerated scientific discovery and 3D
modeling.
Problems/demand Opportunities

HCAI, TAI and XAI
7
[Chamola+23] V. Chamola, et al., A Review of Trustworthy and Explainable Artificial Intelligence (XAI), IEEE Access, 11, 2023
• Responsible AI (RAI)
development, bringing to
the table issues including
but not limited to
fairness, explainability,
and privacy in AI, and
centering AI around
humans [Tahael+23]
Trustworthy AI (TAI) aspects [Chamola+23]
Explainable AI (XAI) approaches [EChamola+23]
[Tahael+23] M. Tahael, et al.., A Systematic Literature Review of Human-Centered, Ethical, and Responsible AI, arXiv:2302.05284v3, 2023
Human-centered AI
(HCAI) [Tahael+23]

[Chamola+23] V. Chamola, et al., A Review of Trustworthy and Explainable Artificial Intelligence (XAI), IEEE Access, 11, 2023
[Noyori+23] Y. Noyori, H. Washizaki, et al., “Deep learning and gradient-based extraction of bug report features related to bug fixing time,” Frontiers in Computer Science, 5, 2023
XAI approaches and more
• XAI approaches in autonomous systems
[Chamola+23]
– Explanation and interpretations of DL models
– Observation-to-action guideline
– Causal inferences in/out of interpretations
– Goal-based forecasting
– Purpose/objective identification
• Technical and social reliability of
explanations
– E.g., fake explanation by surrogate models
and examples
• Counterfactual explanations
– What should be different in the input
instance to change the outcome [Guidotti24]
– Needs to consider constraint of ensuring the
existence of reasonable actions for as many
instances as possible [Kanamori+24]
8
[Guidotti24] Riccardo Guidotti, Counterfactual explanations and how to find them: literature review and benchmarking, Data Mining and Knowledge Discovery, 38, 2024
[Kanamori+24] K. Kanamori, et al., Learning Decision Trees and Forests with Algorithmic Recourse, ICML 2024
using 19990914 build on
win98 … with the same server
the problem was with … to figure
out what the actual bug …
Short bug-fixing-time Long bug-fixing-time
[Noyori+23]

Knowledge Area
Topic Topic
Reference
Material
Body of Knowledge Skills Competencies Jobs / Roles
SWEBOK
Software Engineering Professional Certifications
SWECOM
EITBOK
Learning courses
9
Guide to the Software Engineering Body of Knowledge (SWEBOK) [Washizaki24]
https://www.computer.org/education/bodies-of-knowledge/software-engineering
• Guiding researchers and practitioners to identify and have
common understanding on “generally-accepted-knowledge”
in software engineering
• Foundations for certifications and educational curriculum
• ‘01 v1, ‘04 v2, ‘05 ISO adoption, ‘14 v3, ’24 v4 just published!
[Washizaki24] H. Washizaki, eds., Guide to the Software Engineering Body of Knowledge (SWEBOK Guide), Version 4.0, IEEE Computer Society, 2024

SWEBOK Evolution from V3 to V4
• Modern engineering, practice update, BOK grows and recently developed areas
Requirements
Design
Construction
Testing
Maintenance
Configuration Management
Engineering Management
Process
Models and Methods
Quality
Professional Practice
Economics
Computing Foundations
Mathematical Foundations
Engineering Foundations
Requirements
Architecture
Design
Construction
Testing
Operations
Maintenance
Configuration Management
Engineering Management
Process
Models and Methods
Quality
Security
Professional Practice
Economics
Computing Foundations
Mathematical Foundations
Engineering Foundations
V3 V4
Agile,
DevOps
AI and
SE

Metamodel
ML
evaluation
Visualizing issues
ML
evaluation
Visualizing resolution
OK
OK OK
Failed Failed
OK OK
OK
O
K
OK OK OK
[ML.VP1🡨
AI.VP1]
Providereliable
real-timeobject
detectionsystem
fordriving
decisionmakingin
highway(incl.
trafficsign
detectionand
lane/vehicle
detection)
• [ML.DS1]Procured
datasets
• [ML.DS2]Internal
databasefrom
collectionduring
operation
• [ML.DC1]Openand
commercialdatasets
• [ML.DC2]Data
collectedduring
operation(imageand
identificationresult)
•[ML.F1🡨
AI.D1/AI.D3]
Boundingbox
forobject(incl.
othervehicles
orsigns)
•[ML.F2🡨
AI.D2]Ridge
detectionfor
lanedetection
[ML.BM1]
Modelswillbe
developed,
tested,and
deployedtocars
monthly
• [ML.PT1]Input:
imagefromsensors
• [ML.PT2←AI.D]
Output:trafficsigns,
lanemarking,
vehicles,and
pedestrians.
[ML.De1]Use
predictionresults
fordecision-
makinginself-
drivingsystem
[ML.IS1]
Usingtestdata,
achieveveryhigh
recallandhigh
precisionin
followingcondition:
night,rainy,and
generalcondition
Datasetsissplitinto
80:20ratio
[ML.MP1]
Predictionshould
bemadein
batchesreal
time.
[ML.M1]Inputdatamonitoring
[ML.VP1🡨
AI.VP1]
Providereliable
real-timeobject
detectionsystem
fordriving
decisionmakingin
highway(incl.
trafficsign
detectionand
lane/vehicle
detection)
• [ML.DS1]Procured
datasets
• [ML.DS2]Internal
databasefrom
collectionduring
operation
• [ML.DC1]Openand
commercialdatasets
• [ML.DC2]Data
collectedduring
operation(imageand
•[ML.F1🡨
AI.D1/AI.D3]
Boundingbox
forobject(incl.
othervehicles
orsigns)
•[ML.F2🡨
AI.D2]Ridge
detectionfor
lanedetection
[ML.BM1]
Modelswillbe
developed,
tested,and
deployedtocars
monthly
• [ML.PT1]Input:
imagefromsensors
• [ML.PT2←AI.D]
lanemarking,
vehicles,and
pedestrians.
[ML.De1]Use
predictionresults
fordecision-
makinginself-
drivingsystem
[ML.IS1]
Usingtestdata,
achieveveryhigh
recallandhigh
precisionin
followingcondition:
night,rainy,and
generalcondition
Datasetsissplitinto
80:20ratio
[ML.MP1]
Predictionshould
bemadein
batchesreal
time.
[ML.VP1🡨
AI.VP1]
Providereliable
real-timeobject
detectionsystem
fordriving
decisionmakingin
highway(incl.
trafficsign
detectionand
lane/vehicle
detection)
•[ML.DS1]Procured
datasets
•[ML.DS2]Internal
databasefrom
collectionduring
operation
•[ML.DC1]Openand
commercialdatasets
•[ML.DC2]Data
collectedduring
operation(imageand
•[ML.F1🡨
AI.D1/AI.D3]
Boundingbox
forobject(incl.
othervehicles
orsigns)
•[ML.F2🡨
AI.D2]Ridge
detectionfor
lanedetection
[ML.BM1]
Modelswillbe
developed,
tested,and
deployedtocars
monthly
•[ML.PT1]Input:
imagefromsensors
•[ML.PT2←AI.D]
lanemarking,
vehicles,and
pedestrians.
[ML.De1]Use
predictionresults
fordecision-
makinginself-
drivingsystem
[ML.IS1]
Usingtestdata,
achieveveryhigh
recallandhigh
precisionin
followingcondition:
night,rainy,and
generalcondition
Datasetsissplitinto
80:20ratio
[ML.MP1]
Predictionshould
bemadein
batchesreal
time.
Adding repair-strategy
ML training
ML repair
SE4AI: System modeling and MLOps integration [Takeuchi+24][Husen+24]
12
[Husen+24] J. H. Husen, H. Washizaki, et al., Integrated Multi-view Modeling for Reliable Machine Learning-Intensive Software Engineering, Software Quality Journal, 32, 2024
[Takeuchi+24] H. Takeuchi, et al., Enterprise Architecture-based Metamodel for Machine Learning Projects and its Management, Future Generation Computer Systems, 2024
Requirements
Construction
Design
Test
Architecture
Operations
Economics
Models and Methods
Quality
Requirements
analysis and design

SWEBOK: AI for SE, SE for AI
• Software engineering and AI are mutually related to
each other in basically two ways: Software
engineering for AI systems (i.e., SE for AI), and AI
applications in software engineering (i.e., AI for SE)
• SE for AI
– There is a need for particular support of SE for AI,
– such as interdisciplinary collaborative teams of data
scientists and software engineers, software evolution
focusing on large and changing datasets, and ethics and
equity requirements engineering.
– E.g., ML testing
• AI for SE
– Aims to establish efficient ways of building high-quality
software systems by replicating human developers’
behavior.
– E.g., AI/ML for software testing 13
Software
engineering
AI
AI for SE
SE for AI

ML Testing: Approaches and challenges in industry [Rahman+23]
• ML model implementation testing
– Performance-based testing: Sanity
checks, performance against
benchmark/baseline, cross-
model/algorithm/language/platform
testing
– Visualization
– Traditional unit testing and debugging
– Domain knowledge-based validation
• ML code defect detection
– Performance based symptoms
– Training behaviors
– Model output
• Challenges of testing ML applications
– Black-box nature of ML models
– Model’s robustness to errors
– Data quality
– Volatile performance
– Domain expertise
– Cost
– Lack of concrete methodology
– Interpretability, explainability
• Challenges of post-deployment testing
– Test data
– Performance
– Resource requirements
– System complexity
– Platform diversity
– Adaptability
– User satisfaction
14
[Rahman+23] M.S. Rahman, F. Khomh, A. Hamidi, J. Cheng, G. Antoniol, H. Washizaki, “Machine Learning Application Development: Practitioners’ Insights,”
Software Quality Journal, 31, 2023.

ML Testing: Topics and approaches
• Bugs & debugging
• Explanations
• Quality testing
• Test architecture & languages
• Test case: Generation &
selection
• Testing metrics
• Testing methods [Sherin+19][Wan+24]
– Metamorphic testing
– Coverage based testing
– Adversarial testing
– Mutation testing
– Symbolic & concolic testing
– Multi-implementation testing
– Evolutionary computing
– Search-based testing
– Fuzzing
– Combinatorial testing
15
[Fernandez+22] S.M. Fernández, et al., Software Engineering for AI-Based Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31(2), 2022
[Wan+24] X. Wan, et al., Coverage-guided fuzzing for deep reinforcement learning systems, JSS, 210, 2024
[Sherin+19] S. Sherin, et al., A Systematic Mapping Study on Testing of Machine Learning Programs, arXiv:1907.09427, 2019
[Fernandez+22]

ML Testing: Taxonomy
16
[Song+22] Q. Song, et al., Exploring ML testing in
practice – Lessons learned from an interactive rapid
review with Axis Communications, CAIN 2022
[Zhang+22] J.M. Zhang, et al., Machine
Learning Testing: Survey, Landscapes and
Horizons, TSE, 48(1), 2022

AI/ML for Testing: Topics [Yang+22]
• Bug-related detection
• Bug localization
• Vulnerability detection
• Testing techniques
• Test case generation
• Program analysis
17
[Yang+22] Y. Yang, et al., A Survey on Deep Learning for Software Engineering, ACM Computing Surveys, 54(10), 2022

Generative AI for Effective Software Development (Springer, 2024)
• Fundamentals of Generative AI
– An Overview on Large Language Models
• Patterns and Tools for the Adoption of
Generative AI in Software Engineering
– Comparing Proficiency of ChatGPT and Bard in
Software Development
– DAnTE: A Taxonomy for the Automation
Degree of Software Engineering Tasks
– ChatGPT Prompt Patterns for Improving Code
Quality, Refactoring, Requirements Elicitation
and Software Design
– Requirements Engineering Using Generative
AI: Prompts and Prompting Patterns
– Advancing Requirements Engineering Through
Generative AI: Assessing the Role of LLMs
• Generative AI in Software Development: Case
Studies
– Generative AI for Software Development: A
Family of Studies on Code Generation
– BERTVRepair: On the Adoption of CodeBERT
for Automated Vulnerability Code Repair
– ChatGPT as a Full-Stack Web Developer
• Generative AI in Software Engineering
Processes
– Transforming Software Development with
Generative AI: Empirical Insights on
Collaboration and Workflow
– How Can Generative AI Enhance Software
Management? Is It Better Done than Perfect?
– Value-Based Adoption of ChatGPT in Agile
Software Development: A Survey Study of
Nordic Software Experts
– Early Results from a Study of GenAI Adoption
in a Large Brazilian Company: The Case of
Globo
• Future Directions and Education
– Generating Explanations for AI-Powered Delay
Prediction in Software Projects
– Classifying User Intent for Effective Prompt
Engineering: A Case of a Chatbot for Startup
Teams
– Toward Guiding Students: Exploring Effective
Approaches for Utilizing AI Tools in
Programming Courses 18

19
[Wang+24] J. Wang, et al., Software Testing with Large Language Models: Survey,
Landscape, and Vision, TSE, 2024
AI/ML for Testing: LLM for testing [Wang+24]

What we need to discuss?
• Reliable AI and AI for reliable systems with evaluation
– Multimodal LLMs for Digital Twins (Abdulmotaleb El Saddik)
– Reliable Machine Learning from Imperfect Information: Recent Advances and Future
Challenges (Masashi Sugiyama)
– Performance and Safety Evaluation for N-version Perception Systems (Fumio Machida)
• AI testing: Quality Assurance and Trustworthiness of AI-based Systems
– Quality Assurance of AI-Based Systems, Hiroshi Maruyama
– Uncertainty-Assisted Testing and Trustworthy Decision Making with Machine Learning
(Shervin Shirmohammadi)
– Testing AI: Navigating Complex Challenges, Approaches, and Future Pathways (San
Murugesan)
– Metrics-based Repair/Break Prediction for Effective DNN Testing (Naoyasu Ubayashi)
– Towards Reliable AI-Enabled Cyber-Physical Systems: Testing, Debugging, and Beyond (Jianjun
Zhao)
– Towards Trustworthy Assurance of AI Systems in the LLM Era (Lei Ma)
– AI quality management and issues in Hitachi DX Engineering Research (Daisuke Shimbara)
• AI for testing: AI and Foundation Model-empowered Testing
– Leveraging LLMs in Software Testing: Current State-of-the-Art and Future Directions (Hussein
Al Osman)
– Automated Test Specification Updates using LLM (Shogo Tokui)
– Testing and Repairing Technique for AI and by AI in Automated Driving Systems: Before and
After LLM (Fuyuki Ishikawa)
– AI Foundation Models for Cyber-Physical Systems Testing (Shaukat Ali)
20

IEEE Software Testing Technology Development Trend

More Related Content

Similar to IEEE Software Testing Technology Development Trend

More from Hironori Washizaki

Recently uploaded

IEEE Software Testing Technology Development Trend