Ph.D. Dissertation of my thesis on Artificial Intelligence at Ecole nationale Superieure d'Informatique of Algiers. This thesis is about case-based reasoning founded on randomization and it is supervised by Pr. Stuart Rubin from NIWC Pacific of San Diego and by Dr. Lydia bouzar from LCSI laboratory of ESI of Algiers.
Case based reasoning founded on randomization final
1. Case-Based Reasoning Founded On
Randomization
Presented by: Miled Basma BENTAIBA LCSI, ESI,Algiers,Algeria
Supervisor: Pr. Stuart H. RUBIN NIWC Pacific, San Diego, CA, USA
Co-supervisor: Dr. Lydia BOUZAR LCSI, ESI,Algiers,Algeria
Thesis defense
Prepared with view to obtaining an
LMD Doctorate in Computer Science
July 2021
5. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Context & Motivation
World
Problems tend to
recur
Similar problems have
similar solutions
5
6. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Context & Motivation
The concept of
problem solving for
humans
Case-based reasoning
imitates the human
thinking
An evolutionary case
base versus a static
case base
6
7. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain Case base
New problem
Case base
[Aamodt and Plaza, 1994]
Learned case
7
8. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Problematic & Proposed Solution
How to ensure
accuracy and
efficiency of CBR’s
problem resolution?
Current Solutions
• Feed the case base
using inference
methods
Problem
• Generate a massive
case base
• Time consuming
• Applied on rules
Proposed Solution
• Use randomization to
generalize and to
generate new data
Problem
• The generated cases
may not be valid
Solution
• Validate the generated
cases before their use
8
9. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Randomization
Random
Transformations
Non-inferential
transmutation
e.g. Mutation
Operation in GA
9
10. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Randomization in Case-Based Reasoning
There are three main reasons for why using randomization on CBR:
Enlarge the reasoning
Space
Extract explicit
knowledge from
implicit One
Keep the case base
compressed &
optimized
10
12. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning Tasks
Tasks
of
Case-Based
Reasoning
AnalyticTasks
Classification
Diagnosis
Prediction
SyntheticTasks
Configuration
Planification
Design
12
13. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning for Analytic Tasks
Tasks
of
Case-Based
Reasoning
AnalyticTasks
SyntheticTasks
Retrieve
Reuse
Revise
Retain
Divide the case base into
small case bases
[Fan et al., 2011; Smiti and Elouedi, 2013]
Shrink the volume of the
case base into smaller size
[Smiti and Elouedi, 2018; Bentaiba-lagrid et
Al., 2018; Smiti and Elouedi, 2010; Smiti and;
Elouedi, 2014;Yan et Al., 2016]
Case
base
New
problem
13
14. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning for Analytic Tasks
Tasks
of
Case-Based
Reasoning
AnalyticTasks
SyntheticTasks
Retrieve
Reuse
Revise
Retain Case
base
Rank features according
to their importance
[Ahn and Kim, 2009; Elter et Al., 2007; Huang
et al., 2012;Yan et al., 2016]
Use intelligent systems for
the retrieve
[Quellec et al., 2010; Rezvan et al., 2013]
New
problem
14
15. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning for Analytic Tasks
Tasks
of
Case-Based
Reasoning
AnalyticTasks
SyntheticTasks
Retrieve
Reuse
Revise
Retain Case
base
New
problem
Use adaptation for the
Reuse
[Mazurowski et al., 2008;Yan et al., 2016]
Decide whether retaining
a case or not
[Sharaf-el-deen et al., 2014;Yan et al., 2016]
15
16. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning for Analytic Tasks
Tasks
of
Case-Based
Reasoning
AnalyticTasks
SyntheticTasks
Retrieve
Reuse
Revise
Retain Case
base
New
problem
Revise is highly related
with the application
domain
16
17. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning for Synthetic Tasks
Tasks
of
Case-Based
Reasoning
AnalyticTasks
SyntheticTasks
Retrieve
Reuse
Revise
Retain Case
base
New
problem
Structure the cases into a
hierarchy of sub-components
that constitutes the cases
[Burke et al., 2001; Burke et al., 2006]
Decompose the case into
small solvable problems
[Burke et al., 2006]
Model the case base into
petri net language
[Lim et al., 2015]
17
19. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Research Directions & Contributions
Tasks
AnalyticTasks Diagnosis
Cases
Randomization
Mammography
Mass
CBR modules Medical Diagnosis
SyntheticTasks
Planification
Cases
Randomization
Route Planning
Design
Cases
Randomization
Scheduling
Systems
19
Task Contribution
Application
Domain
20. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Contributions on Randomization
Tasks
AnalyticTasks Diagnosis
Cases
Randomization
Mammography
Mass
CBR modules Medical Diagnosis
SyntheticTasks
Planification
Cases
Randomization
Route Planning
Design
Cases
Randomization
Scheduling
Systems
20
Task Contribution
Application
Domain
21. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Contributions on Randomization
User Interface
Case Base
Cases
Randomization
Retrieve
Reuse
Retain
Revise
Case-Based Reasoning
Validation of
Generated
Cases
21
Overview
22. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Contributions on Randomization
22
Contribution 1:
Route planning
Contribution 2:
Scheduling
Systems
Contribution 3:
Mammography
Mass
Application Domains
23. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Contributions on Randomization
23
Approaches Details: Cases Randomization
Route
planning
• Problem
part
substitution
• Solution
part
substitution Scheduling
System
• Problem
part
substitution
Mammography
Mass
• Problem
part
substitution
24. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Contributions on Randomization
24
Approaches Details: CasesValidation
Route
planning
• Coherence
Verification
Scheduling
System
• Adding
constraints
• (Pre-randomization)
• Coherence
verification
• (Post-
randomization)
Mammography
Mass
• Coherence
verification
• Stochastic
validation
• Absolute
validation
25. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Contributions on Randomization
25
Approaches Details: Datasets
Route
planning
• OpenStreetMap
across Algiers
city
Scheduling
System
• Project
Scheduling
Problem Library
Mammography
Mass
• UCI Machine
Learning
Repository
26. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Contributions on Randomization
26
Approaches Details: Research Findings
Route
planning
• 40% of
problems are
resolved using
new cases
• Case base is
augmented by
140% Scheduling
System
• 30% of
problems are
resolved using
new cases
• 91% of results
are better than
the benchmarks
• Improvement of
2 to 15%
Mammography
Mass
• Problems
resolution
increased by 8%
• 90% of
generated cases
are already valid
27. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Discussion on Contributions on Randomization
Limitations
27
The involvement of the expert for validation is necessary
The case base is controlled by randomization module
No contributions inside case-based reasoning modules
Validation module is separated from revise module
User Interface
Case Base
Cases
Randomization
Retrieve
Reuse
Retain
Revise
Case-Based Reasoning
Validation of
Generated Cases
28. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Tasks
AnalyticTasks Diagnosis
Cases
Randomization
Mammography
Mass
CBR modules Medical Diagnosis
SyntheticTasks
Planification
Cases
Randomization
Route Planning
Design
Cases
Randomization
Scheduling
Systems
28
Task Contribution
Application
Domain
29. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Retrieve
• Similarity functions
Reuse
• Copy cases solutions
• Copy delegates
• Test with rules
Retain
• Store the case in
segments or increment
its frequency
Revise
• Coherence verification
• Stochastic validation
• Absolute validation
Case Base
Segmented with delegates
Feature Selection &
Weighting Module
• Offline process
Rules Generation
Module
• Periodic Process
Amplification using
Randomization Module
• At each new coming case
Case-Based Reasoning External Modules
User Interface
29
30. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Retrieve
• Similarity functions
Reuse
• Copy cases solutions
• Copy delegates
• Test with rules
Retain
• Store the case in
segments or increment
its frequency
Revise
• Coherence verification
• Stochastic validation
• Validation using rules
Case Base
Segmented with delegates
Feature Selection &
Weighting Module
• Offline process
Rules Generation
Module
• Periodic Process
Amplification using
Randomization Module
• At each new coming case
Case-Based Reasoning External Modules
User Interface
30
31. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
1. Feature Selection & Weighting
31
and with highly similar
values of feature ti
and with slightly similar
values of feature ti
Pairs of cases with the same
solution
V Y
Pairs of cases with different
solutions
X Z
32. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
2. Rules Generation
Start Point
(Benign)
3
Ill-defined
Lobular
52
4
Circumscribed
Round
54
Oval
60
Lobular
36
BI-RADS
Mass Margin
Mass Shape
Age
32
33. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Retrieve
• Similarity functions
Reuse
• Copy cases solutions
• Copy delegates
• Test with rules
Retain
• Store the case in
segments or increment
its frequency
Revise
• Coherence verification
• Stochastic validation
• Validation using rules
Case Base
Segmented with delegates
Feature Selection &
Weighting Module
• Offline process
Rules Generation
Module
• Periodic Process
Amplification using
Randomization Module
• At each new coming case
Case-Based Reasoning External Modules
User Interface
33
34. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Case Base Segmentation
Pj highly
similar to D1
… …
Pj highly
similar to Dj
Pj less highly
similar to D1
…
Pj slightly
similar to D1
…
… …
…
Pj slightly
similar to Dj
…
Pj less highly
similar to Dj
…
…
delegate D1 … delegate Dk … delegate Dj
Levels
level
1
level
2
…
level
n
Sector presented by S = S1
Cj: problem (Pj) solution (Sj)
Representative
of
a
segment
composed
of:
S
=
S
1
Delegate
=
D
k
Sector
presented by S
= Sm
Case form:
34
35. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Retrieve
• Similarity functions
Reuse
• Copy cases solutions
• Copy delegates
• Test with rules
Retain
• Store the case in
segments or increment
its frequency
Revise
• Coherence verification
• Stochastic validation
• Validation using rules
Case Base
Segmented with delegates
Feature Selection &
Weighting Module
• Offline process
Rules Generation
Module
• Periodic Process
Amplification using
Randomization Module
• At each new coming case
Case-Based Reasoning External Modules
User Interface
35
36. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Retrieve Reuse
Cases with similarity > threshold
return the solution(s) of the selected
case(s)
Segments with similarity > threshold return the solution(s) of the segment(s)
Test the problem with different solutions
using the generated rules
return the succeeded solution(s)
36
37. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Retrieve
• Similarity functions
Reuse
• Copy cases solutions
• Copy delegates
• Test with rules
Retain
• Store the case in
segments or increment
its frequency
Revise
• Coherence verification
• Stochastic validation
• Validation using rules
Case Base
Segmented with delegates
Feature Selection &
Weighting Module
• Offline process
Rules Generation
Module
• Periodic Process
Amplification using
Randomization Module
• At each new coming case
Case-Based Reasoning External Modules
User Interface
37
38. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
1. Revise & Retain Process
Revise Module
coherence verification
stochastic validation
absolute validation
validity < threshold
coherent
cases
New case
38
• Coherence verification using our
stochastic grammar
• Stochastic validation :
Probability of the case’s validity,
a dynamic value
• Absolute validation:
Validation using the generated rules
Expert validation (non-essential)
validity >=
threshold
39. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Experiments: Datasets
Mammography Mass
• 5 features (4 categorical,
1 numerical)
• 2 classes
• 961 cases
Thyroid Disease
• 5 features (5 numerical)
• 3 classes
• 215 cases
39
40. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Experiments: Case-Based Reasoning Prototypes
40
Configurations
Case Base
Segmentation
Randomization Retrieve
NR_NS Flat No Cases
NR_S Segmented No Cases & Delegates
R_S Segmented Yes Cases & Delegates
R_NS Flat Yes Cases
41. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Experiments Map
Experiments Benchmarks
Black Box
White Box
Case Base Amplification
Feature Weighting
CBR Modules
Machine Learning
Other Related Work
Confusion Matrix Metrics
ROC Curve & AUC
Resolution Time
41
42. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Black Box Experiments
Experiments Benchmarks
Black Box
White Box
Case Base Amplification
Feature Weighting
CBR Module
Machine Learning
Other Related Work
Confusion Matrix Metrics
ROC Curve & AUC
Resolution Time
42
43. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Black Box Exp: Metrics Related to Confusion Matrix
0
10
20
30
40
50
60
70
80
90
100
NR_NS NR_S R_NS R_S
Mammography Mass
Resolution Capacity Accuracy f1 Score
0
10
20
30
40
50
60
70
80
90
100
NR_NS NR_S R_NS R_S
Thyroid Disease
Resolution Capacity Accuracy f1 Score
43
45. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Experiments Benchmarks
Black Box
White Box
Case Base Amplification
Feature Weighting
CBR Modules
Machine Learning
Other Related Work
Confusion Matrix Metrics
ROC Curve & AUC
Resolution Time
Benchmarks
45
46. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Benchmarks: Comparison to Machine Learning (Using Accuracy)
82.29
82.29
82.29
82.29
76.04
80.21
85.42
73.48
67.6
99.7
88.15
LR
SVM
KNN
NB
Perceptron
DT
RF
NR_NS
NR_S
R_NS
R_S
98.57
96
100
96
92
96
96
35.18
83.33
97.52
94.81
Mammography Mass Thyroid Disease
46
47. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Benchmarks: Comparison to Related Works (Using Accuracy)
Mammography Mass Thyroid Disease
75.45
78.13
79.47
79.79
80
80.62
82.29
77.08
58.33
98.96
96.87
91.86
94.14
95.16
95.3
95.7
97.02
97.49
35.18
83.33
97.52
94.81
GDM-GA-CBR
GA-CBR
WE-CBR
DSL-CBR
NN
WEH-CBR
DUL-CBR
NR_NS
NR_S
R_NS
R_S
GDA-WSVM
DeFalco
GA
CRCR_SVM
Somrani
Hayashi
FS-PSO-SVM
NR_NS
NR_S
R_NS
R_S
47
48. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Discussion
Categorical DatasetVS Numerical Dataset
RandomizationVS No Randomization
SegmentationVS No Segmentation
Which Prototype is Recommended?
48
49. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Case-Based Reasoning For Medical Diagnosis
Discussion
NR_NS
• Bad accuracy
NR_S
• Bad accuracy
R_NS
• Excellent accuracy
• Very bad resolution time
R_S
• Very good accuracy
• Good resolution time
49
51. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Objectives
Propose a
Randomization
Approach for
Knowledge
Amplification
Build a CBR with the
proposed
Randomization
Apply the Full
Approach on Real
Datasets
Ensure Accuracy
Ensure Good
ResolutionTime
51
52. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Challenges
Research on CBR is reduced in favor of
ML
“AccuracyTakes All” Paradigm
Randomization in CBR is a Recent
Research Direction
52
53. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Our Achievements
Randomization for
Planification Tasks
• Integrate it in CBR
• Test it on Route
Planning Problems
Randomization for
Design Tasks
• Integrate it in CBR
• Test it on scheduling
Problem
Randomization for
Diagnosis Tasks
• Integrate it in CBR
• Test it on
Mammographic Mass
Classification Problem
CBR For Diagnosis
Tasks
• Extend the previous
approach
• Test it on the Medical
Field
53
54. Introduction – State of the Art – Contributions – Conclusion & Perspectives
Perspectives
• Add a layer for
conversational CBR
Improve
• Create a global
framework for
analytic tasks
• Create a global
framework for
synthetic tasks
Generalize
• Create hybrid
system by combining
ML with CBR
[Burca et al., 2018]
New Research
Directions
54
55. List of Publications
ESWA 2020 Bentaiba-Lagrid, M. B., Bouzar-Benlabiod, L., Rubin, S. H., & Hanini, M. R. (2020). A Case-Based
Reasoning System for Supervised Classification Problems in the Medical Field. Expert Systems with Applications,
113335.
IRI 2017 Bouabana-Tebibel,T., Rubin, S. H., Bentaiba, M. B., Allaoua, A., & Boumhand, A. (2017, August). Knowledge
Amplification through Randomization for Scheduling Systems. In 2017 IEEE International Conference on Information
Reuse and Integration (IRI) (pp. 589-598). IEEE.
IRI 2018a Bentaiba-Lagrid, M. B., Bouzar-Benlabiod, L., Rubin, S. H., Bouabana-Tebibel,T., & Hanini, M. R. (2018, July).
Knowledge Amplification Using Randomization in Case-Based Reasoning--Case Study: Severity of Mammography Mass.
In 2018 IEEE International Conference on Information Reuse and Integration (IRI) (pp. 155-162). IEEE.
IRI 2018b Bouabana-Tebibel,T., Rubin, S. H., Bouzar-Benlabiod, L., Bentaiba-Lagrid, M. B., & Hanini, M. R. (2018, July).
Knowledge-Based Randomization for Amplification. In 2018 IEEE International Conference on Information Reuse and
Integration (IRI) (pp. 147-154). IEEE.
CIIA 2018 Bentaiba-Lagrid, M. B., Bouzar-Benlabiod, L., Rubin., (2018, May) Randomization Approach in Case-Based
Reasoning System to Amplify the Knowledge Base. In 2018 IFIP international conference on Computational Intelligence
and Its Applications (CIIA).
55
57. References
Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications,
7(1), 39-59.
Ahn, H., & Kim, K. J. (2009). Global optimization of case-based reasoning for breast cytology diagnosis. Expert Systems with Applications, 36(1), 724-
734.
Bentaiba-Lagrid, M. B., Bouzar-Benlabiod, L., Rubin, S. H., Bouabana-Tebibel, T., & Hanini, M. R. (2018, July). Knowledge Amplification Using
Randomization in Case-Based Reasoning--Case Study: Severity of Mammography Mass. In 2018 IEEE International Conference on Information Reuse
and Integration (IRI) (pp. 155-162). IEEE.
Burca, D., Schüller, M., & Zlabinger, J. (2018). Case-based Reasoning and Machine Learning.
Burke, E. K., MacCarthy, B., Petrovic, S., & Qu, R. (2001, July). Case-based reasoning in course timetabling: an attribute graph approach. In
International Conference on Case-Based Reasoning (pp. 90-104). Springer, Berlin, Heidelberg.
Burke, E. K., MacCarthy, B. L., Petrovic, S., & Qu, R. (2006). Multipleretrieval case-based reasoning for course timetabling problems. Journal of the
Operational Research Society, 57(2), 148-162.
Elter, M., Schulz‐Wendtland, R., & Wittenberg,T. (2007).The prediction of breast cancer biopsy outcomes using two CAD approaches that both
emphasize an intelligible decision process. Medical physics, 34(11), 4164-4172.
Fan, C.Y., Chang, P. C., Lin, J. J., & Hsieh, J. C. (2011). A hybrid model combining case-based reasoning and fuzzy decision tree for medical data
classification.Applied Soft Computing, 11(1), 632-644.
Huang, M. L., Hung,Y. H., Lee,W. M., Li, R. K., & Wang,T. H. (2012). Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference
system classification techniques in breast cancer dataset classification diagnosis. Journal of medical systems, 36(2), 407-414.
57
58. References
Lim, J., Chae, M. J., Yang, Y., Park, I. B., Lee, J., & Park, J. (2015). Fast scheduling of semiconductor manufacturing facilities using case-based reasoning. IEEE
Transactions on Semiconductor Manufacturing, 29(1), 22-32.
Mazurowski, M. A., Zurada, J. M., & Tourassi, G. D. (2008). Selection of examples in case-based computer-aided decision systems. Physics in Medicine &
Biology, 53(21), 6079.
Pereira, I., & Madureira,A. (2013). Self-optimization module for scheduling using case-based reasoning.Applied Soft Computing, 13(3), 1419-1432.
Sharaf-El-Deen, D. A., Moawad, I. F., & Khalifa, M. E. (2014). A new hybrid case-based reasoning approach for medical diagnosis systems. Journal of
medical systems, 38(2), 9.
Smiti, A., & Elouedi, Z. (2010). Coid: Maintaining case method based on clustering, outliers and internal detection. In Software Engineering, Artificial
Intelligence, Networking and Parallel/Distributed Computing 2010 (pp. 39-52). Springer, Berlin, Heidelberg.
Smiti, A., & Elouedi, Z. (2013, April). Using clustering for maintaining case based reasoning systems. In 2013 5th International Conference on Modeling,
Simulation and Applied Optimization (ICMSAO) (pp. 1-6). IEEE
Smiti, A., & Elouedi, Z. (2014, June). Maintaining case based reasoning systems based on soft competence model. In International Conference on
Hybrid Artificial Intelligence Systems (pp. 666-677). Springer, Cham.
Smiti, A., & Elouedi, Z. (2018). SCBM: soft case base maintenance method based on competence model. Journal of Computational Science, 25, 221-
227
Yan, A., Song, H., & Wang, P. (2016). Case-based reasoning model with genetic algorithms, group decision-making and template reduction. International
Journal on Artificial Intelligence Tools, 25(02), 1550032
58
Editor's Notes
30s
Hello everyone, and thank you for coming. I’m here to present my thesis entitled Case-based reasoning founded on randomization. This work is supervised by Pr. Stuart Rubin from and by Dr. Lydia Bouzar
Naval Information Warfare Center pacific
30s
Before we get started, I would like to dedicate this presentation to Pr. T Bouabana-Tebibel allah yer7amha. She was the owner of this project and my supervisor in my first year of Ph.D. studies before her loss. She gave me a great push to continue working on the project when she was alive.
30s
So, the outline of this presentation is :
An introduction to the research area
A state of the Art on our main keywords
We will then present our contributions on different application domains
Next, we will discuss the findings and results of our work
So let’s start with the introduction
30s
The nature of the world is based on two tenets: the first one is that problems tend to recur and the second one is that similar problems have similar solutions. The human intelligence come from the understanding of these two principles.
1m
Humans can solve problems that they confront on daily basis, by using previous experiences and adapt their solutions to the newly encountered problems.
Case-based reasoning is a system that imitates the human thinking. It has a case base, storing experiences captured from the real world and called cases. When a new problem come, this system searches for the most similar case and adapt its solution to the current problem.
It is clear that when we have a static and non-evolving case base, the problem resolution process won’t be accurate. On the other hand, a massive case base may slow down the resolution process.
1m
Case-based reasoning is a methodology that was developed in the ninetieth. It gained a great interest and popularity at that time.
It stores experiences and past resolved problems in its case base. It is composed of four modules that their main functions are:
Retrieve the most similar cases to the current problem,
Reusing the retrieved case(s) by adapting them to attempt to solve the problem,
Revise the proposed solution, and
Retain the new solution as a part of a new case.
The case based reasoning provide along with the solution, the explanations and description. which will provide more trustworthiness and reliability to the solution
1m15s
Now the question is: how to ensure accuracy and efficiency in our case-based reasoning system.
Current solutions attempt to feed the case base using inference methods. While this is effective, it has a lot of problems:
The first problem is that inference leads to having a massive case base
The second problem is that inference is a long process and time consuming
The third problem is that inference is applied on rules that are valid but hard to obtain, contrarily to cases that are experiences taken from the world as is with no effort.
To encounter these problems, we propose to use randomization on cases as a way to engender knowledge.
But, cases generalized from randomization are not necessarily valid. This is why we propose to validate the generated data before their use for future problem solving.
30s
Randomization is briefly a non-inferential transmutation to generalize and to generate new knowledge.
By definition, randomization transforms one knowledge segment to another one by making random changes
An example of randomization is: mutation operation in a genetic algorithm
45s
There are three main benefits of using randomization in a case-based reasoning:
The first one is to extract the explicit knowledge from the implicit ones without deteriorating the competence of the case base.
The second reason is that generalizing knowledge by randomization is not necessarily valid. Thus, the advantage is that randomization can highly enlarge the reasoning space. The error-metric in randomization is not always zero, but it is bounded, thus controllable.
The third reason is to keep the case base optimized and compressed
6m45s
After we explained the motivation of our research, now we will see the main related works
1m45s
Case based reasoning is a system that can solve analytic tasks that are:
Classification, which associates an instance with a solution in its purest form.
For the diagnosis, targets are diseases in medical problems and faults in technical fields.
Prediction associates an instance with a class representing an event that has not yet occurred. (e.g. weather prediction)
Case-based reasoning is also effective for solving synthetic tasks that are:
Configuration: Given a set of components, configuration is the task of selecting a subset of components and choosing their parameters to specify a system
The results in planification are series of executable actions. It is a dynamic object that controls behavior and tells what one has to do.
Design has elements of both of the other types. The particularity in this task is that the problem should not be seen as decomposable even if the reality is decomposed
1m30s
Most of the research on Case-based reasoning, when it is used for analytic tasks, is focusing on the case base maintenance problem. The reason for paying so much interest to the case base is that storing all of the acquired experiences in their natural form can improve the accuracy of the system, but it leads to managing a massive, redundant, and non-optimized case base. This systematically causes deterioration of the resolution time
45s
Another problem that case-based reasoning faces is how to retrieve the most useful cases to use them for the resolution of the problem. Note that the most similar cases are not necessarily the most useful ones for a current problem resolution
1m
Even if most of the research trends are focusing on case base maintenance and case retrieval, there is a minority of research that is contributing to the reuse and the retain.
For the reuse, there are common rules for adaptation, and some other rules are specific to the application domain.
The of the retaining is to store the most useful cases without redundancy to keep the case base optimized and maintained. It consists of deciding, for each captured case, whether it would be stored in the case base or not. Retaining generally depends on the case base structure and cannot be treated separately.
25s
Revise generally comes after the adaptation is made by the reuse module. It is highly related to the application domain and thus to the form of the case. Research on revise module is quite poor. Principally, it is based on an automatic coherence verification or by the expert’s evaluation.
1m10s
What about research for synthetic tasks?
The retrieve in synthetic tasks is a challenge for case-based reasoning. And in this particular task, the structure of the case base highly influence the retrieval process. Many solutions propose to:
Structure the cases into hierarchy of sub component
Decompose the case into small solvable problems and thus when new problem appears reconstitute a case in the reuse module
- Or model the case into petri net language
6m30s
Now we will see our contributions
45s
We have contributions on both analytic tasks and synthetic tasks. We proposed three randomization approaches for three different tasks. Their application domains are chosen based on the datasets availability and popularity.
According to what we have seen as limitations to these approaches, we have chosen one of these approaches and extended it to propose a new implementation of the case-based reasoning modules.
50s
This overview is common with all our randomization approaches. We added two extern modules to our case based reasoning system to amplify the case base. The case base is shared with the system and the extern modules.
The first module is for randomization where we generate data using cases stored in the case base. Then, because that the generated cases are not necessarily valid, they are verified and validated in the validation module then they are stored in the case base.
To ease the randomization process, the case base is structured and represented in segments containing pairs or trios according to the randomization technique.
So let’s see the particularity of each randomization technique.
1m20s
Route Planning
The first contribution is for route planning problem. The aim is to make a agent traveling from one point to an end point by mandatorily crossing some predefined waypoints. The challenge is in discovering the roads to go through.
Scheduling
The second contribution is about scheduling problem, that has different constraints that are:
Activities precedence constraint
The need to allocate for each activity the duration and the resources needed
And not to exceed the capacity of resources of the system
The aim is to minimize the global process duration
Mammography Mass
As for our third contribution is to classify mammographic mass as either benign or malignant according to values BI-RADS features and patient’s age
Breast Imaging-Reporting and Data System
30s
Randomizations applied for scheduling systems and for mammographic mass classification are similar in that they are performed on the problem part of the case only, which generate new cases with new problems and existing solutions. Contrarily to randomization performed on route planning, it is executed on both problem part and on solution part of cases separately, which generates new cases with new problem parts and existing solutions, or new solutions to existing problems.
Proposing a randomization technique means also proposing a new case base segmentation to maintain the case base and have an optimal structure of it which was accomplished by decomposing the case into segments.
30s
The validation of cases after randomization is mainly based on coherence verification. This could be good enough for technical domains like scheduling systems and route planning. But it is not enough for medical domains, because in such application, coherent cases are not necessarily valid.
For this reason, two more validation layers are added, that are: stochastic validation and absolute validation.
40% of problems are resolved using new generated cases by randomization for route planning, and 30% for scheduling systems.
The route planning case base is augmented by 140% of cases
91% of the resolved problems are better than the Serial Schedule Generation Scheme benchmark with an improvement of 2 to 15%
An improvement of 8% of mammography mass problems resolution is seen after randomization
90% of the generated cases are already valid which proves the efficiency of our randomization approach
Lastly, the problems resolution time for all these application area are linear which is a good result
1m45s
However, we arise few limitations on these approaches:
Validation of generated cases needs the expert intervention. Which means that when the number of cases increase, the existence of the expert is time consuming and leads to human errors due to the execution of a repetitive task.
The case base is controlled by the randomization process not by the case based reasoning, which reduces its competence for retrieving cases
The contributions are mainly based on the extern modules and on the case base. No contributions made inside the case-based reasoning system
Validation module is disconnected from the case-based reasoning. And case-based reasoning doesn’t even know that an extern module for validation exists. Even if it accomplish the same task as what the revise module must perform. This is unnecessary duplication of the same piece of work.
6m
40s
In the next contribution, we will go inside the case-based reasoning and explore the new structure of the case base and how the system modules are implemented. It will be an extension of the mammographic mass contribution generalized for the medical diagnosis applications
1m30s
We have a brand new case-based reasoning system with a case base and its four modules.
The case base is segmented in a way that ease the retrieve for similar cases and ease the randomization process.
The randomization module is automatically executed when new cases occur.
We proposed many similarity functions to retrieve cases.
These functions are based on the features weights that are calculated in an external module.
Reusing similar cases are based on three different solutions that we are going to see in the next slides.
The revise module is composed of three different layers
The retain is executed by storing the new case or incrementing its frequency
The rule base creates rules that are used in the reuse module in the absolute validation inside the revise module
Let’s start with the external modules
1m10s
Feature selection and weighting module weights features according to their impact on the solution of cases and selects the most important ones.
For each feature, we calculate :
Number of pairs of cases with the same solution and similar feature values,
And number of pairs with different solutions and different feature values
The total number v and z represent how much this feature can influence the solution of the case
And for the other values: x and y, they represent how much the feature is disconnected and has random impact on the solution
1m40
The external module for rules generation has the aim to extract rules from the case base. These rules help providing solutions to the problems in the reuse module, and participates in the absolute validation process in the revise module.
The algorithm for rules generation is as follow:
For each possible solution of a case, create a tree. The illustration is an example of a tree created from these four cases having benign solution.
The features are weighted from the most important to the least important feature. And the non valuable features are ignored
Rules are extracted from this tree by reducing the paths to the minimum
For example: if this path doesn’t exist in any other tree, then we can say that when bi-rads is equal to 3 the solution is benign
10s
Let’s see how the case base is segmented
50s
The case base is divided into segments. The segment is represented by a delegate that gives an overview of the problem part of the cases contained on that segment, and it is also represented by a class, which is the same solution of all cases contained in that segment.
The segment is divided into levels. The highest level contains cases that are the most similar to the delegate and the lowest level contains cases that are the least similar to the delegate.
Now let’s see how retrieve and reuse are carried out
1m
We have three solutions for the retrieve and reuse:
The first one is to retrieve the most similar cases using similarity functions and copy their solutions
The second one is to retrieve the most similar segments delegates using our similarity functions and copy their solution
The third one is to give the problem different solutions and test them using our generated rules. Keep the succeeded solutions and reject the failed ones
Lastly, we see the revise and the retain modules
1m40s
The revise module is composed of three layers that are:
Coherence verification is a necessary step but not sufficient. The coherence of the case is put together using our built stochastic grammar. Only coherent cases are stored in the case base
Stochastic validity is the probability of the case’s validity. It is based on different parameters such as: the frequency of the case in the case base, and the initial distribution of the case before randomization and how much this case is similar to one of the valid cases.
Stochastic validity is dynamic since the values of these parameters may change when the case base is updated with new cases.
Cases with validity above threshold are moved to absolute validation layer. This layer gives a final answer about the validity of the case using rules
9m
30s
For the experiments on our system, we opted for two of the most popular datasets used for machine learning in the medical field. We have the mammography mass dataset where most of its features are categorical and thyroid disease dataset where all of its features are numerical.
1m
To test our approach, we developed four prototypes with different configurations:
When we say NR it means that no knowledge amplification using randomization is made on the case base
On the other hand, R means that the case base is amplified using randomization
Next, NS, means that there is no specific structure is applied on the case base. It’s just a flat case base
Contrarily to S which means that the case base is structured in segments with delegates and levels of similarity
1m
To validate our approach, we have conducted three types of experiments:
- White box experiments, where we test each module separately. The aim is to prove that each module gives correct results. For the purposes of this presentation, we won’t demonstrate the white box experiments.
Black box experiments, where we test the presented prototypes as a hole to evaluate their reliability and performance
The intention of the benchmarks is to compare our prototypes to the related work, among them: machine learning. Benchmarks are considered as black box experiments.
50s
We are using three different metrics to evaluate our prototypes:
Resolution capacity which gives the ratio of the answered problems no matter if the solution is correct or not
Accuracy
And F1 score is mostly used for domains where the risks are asymmetric: which means that false negatives are more costly than false positives.
The best results for both datasets, are seen when randomization is applied without segmentation
50s
When randomization is applied and no segmentation is set-up for the case base, we see an enormous resolution time comparing to the other prototypes. This is due to the massive and a flat non well organized case base, which slows down the retrieval for similar cases.
For the other three prototypes, they have a close and not-so different resolution time. A classic case-based reasoning with no randomization and no segmentation is the fastest. But this comes with the least accuracy as we have seen in the previous slide, which is not interesting.
45s
The prototype with randomization and no segmentation put in, gives the best accuracy among all machine learning results. This is seen in mammography mass dataset where most of the features are categorical. Contrarily to thyroid disease dataset, where r-ns performs quite close to machine learning. This is due to the specificity of the dataset, where all of the features are numerical. Machine learning is known that it performs very good when data is numerical but less good when data is categorical.
45s
We compare the resolution accuracy to many related works. These results are taken from known and ranked journals as well as conference papers from the last decade. The considered authors are clearly showing the accuracies in their papers and are using the same datasets as we are.
The best accuracies are seen when randomization is applied, but it is less good when segmentation is put in with randomization. We have seen the worst accuracies among all related works when randomization is not applied.
Same for the thyroid disease experiments
1m
When dataset contains more categorical data, our approach performs better than the related work and than machine learning. But when the dataset contains more numerical data, they have closed performance.
When no randomization is applied to the case base, the system gives the lowest accuracies, but when randomization is applied, the system gives the best accuracies among all related work. The problem is that the resolution time with randomization is highly deteriorated
From this point, we see the value of the segmentation, where we lose from 2 to 3% of the accuracy to have the answer more than 100 times faster.
As a conclusion, the most recommended system is the one with randomization and segmentation
6m30
So let’s get jump into the state of the art
30s
Our initially fixed objectives are:
- Propose a Randomization Approach for Knowledge Amplification
- Build a CBR with the proposed Randomization
- Apply the Full Approach on Real Datasets
The aim of these contributions is to provide a case-based reasoning implementation that is able to accurately solve problems in relatively minimal time.
1m
The main confronted challenges are:
Research on Case-based reasoning is reduced in favor of ML and DL
Machine learning tools are known for their good and hard-to-beat accuracies. They are generally better than the CBR’s accuracies. However, accuracy is not everything what is needed. We need to understand the ‘why’ behind the provided results, especially when we are dealing with sensitive domains, such as medical domain and security. This will increase comprehension and trustworthiness which is as important as the accuracy. CBR is explainability-based approach that can provide details about how the result is gotten.
Using randomization in CBR is a recent direction and it is little studied
30s
Our achievements are:
Proposing randomization approaches for planification, design and diagnosis tasks
Intergrade them on case-based reasoning
And test and validate them on real datasets
One of these randomization approaches is extended and new implementation for the case-based reasoning four modules are proposed. Again, this new approach is tested and validated on real datasets
40s
As perspectives, we would like to
improve our approach by adding a layer for conversational case-based reasoning. It is a module that helps the user to formulate his questions
Then we would like to create a framework our approaches so that one will fit all analytic tasks and the other will fit all the synthetic tasks
Another interesting perspective is to create a hybrid system: ML combined with CBR to benefit from the advantages of the both systems and to reduce their limits.
Tree conferences for each randomization approach proposed
One journal paper at Expert systems with applications with an impact factor of 5.452 and citecore of 11. It is ranked as Q1 by scimago and A+ by our ministry of higher educations.
Plus one doctoral symposium presented at CIIA and we gained the first best presentation award