A new way to amplify the case-based reasoning (CBR) knowledge-base using randomization. This method allows knowledge amplification without deteriorating the CBR's resolution time and it was applied to find the severity of mammography mass for patients.
randomization approach in case-based reasoning: case of study of mammography mass
1. Randomization Approach in Case-
Based Reasoning: case of study of
mammography mass
Authors:
• Miled Basma BENTAIBA-LAGRID (LCSI, ESI)
• Dr. Lydia Bouzar-Benlabiod (LCSI, ESI)
• Prof. Stuart H. Rubin (SPAWAR)
• Prof. Thouraya Bouabana-Tebibel (LCSI, ESI)
• Maria Roumeissa Hanini (LCSI, ESI)
1
2. 2
Introduction
The concept of
problem solving for
humans
The case-based
reasoning imitates the
human thinking
An evolutionary case-
base versus a static
case-base
4. 4
Definitions
Case / Case-Base
• { Problem } solutionCase
•Form of the case
•Example
𝐶𝑗: 𝑡1: 𝑣𝑗,1, 𝑡2: 𝑣𝑗,2, … , 𝑡 𝑛: 𝑣𝑗,n → 𝑆𝑗
𝐶1: {𝐵𝐼 − 𝑅𝐴𝐷𝑆: 5, 𝑎𝑔𝑒: 67, 𝑠ℎ𝑎𝑝𝑒: 𝑙𝑜𝑏𝑢𝑙𝑎𝑟, 𝑚𝑎𝑟𝑔𝑖𝑛: 𝑠𝑝𝑒𝑐𝑢𝑙𝑎𝑡𝑒𝑑,
𝑑𝑒𝑛𝑠𝑖𝑡𝑦: 𝑙𝑜𝑤} → 𝑚𝑎𝑙𝑖𝑔𝑛𝑎𝑛𝑡
5. How to increase
accuracy and
efficiency of CBR’s
problem resolution?
Current Solutions
• Feed the case-base
using inference
methods
Problem
• A massive case-base
may deteriorate the
CBR’s rapidity of search
Proposed Solution
• Amplify the knowledge
using randomization
Problem
• The generated cases
may not be valid
Solution
Validate the generated
cases before their use
5
Motivation
6. • A generic method was proposed (S. H. Rubin)
6
Traveling
Robots
Scheduling
systems
Refrigerator
design
Cyber context
Military
strategies
Related Work
Randomization
7. Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented case-
base
Knowledge amplification using
randomization
store the case new iteration
Validation Module
coherence verification
stochastic validation
absolute validation
coherent
cases
stochastic validity >
validity threshold
stochastic validity <
validity threshold
Proposed Approach
Global Overview
7
8. Proposed Approach
The segmented Case-Base Similarities
• Solution part similarities
• Problem part similarities
• Depends on the values of the
attributes
• Qualitative attributes
• Quantitative attributes
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
8
9. Proposed Approach
The segmented Case-Base Attributes Weighting
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
9
• Highly similar
• Same solution
V
• Highly similar
• Different solution
X
• Slightly similar
• Same solution
Y
• Slightly similar
• Different solution
ZW = (V + Z) – (X + Y)
10. Proposed Approach
The segmented Case-Base Knowledge Structuring
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
P highly
similar to
the
delegate
… … …
P less highly
similar to
the
delegate
…
P slightly
similar to
the
delegate
…
… …
… …
… …
…
…
delegate
P1
…
delegate
Pk
…
delegate
Pj
Levels
level1level2…leveln
Sector represented by S = X1
C: problem (P) solution (S)
Segmentrepresentedby:
S=X1
Delegate=Pk
Sector
represented
by with S =
Xm
Case form:
10
11. Proposed Approach
Knowledge Amplification Using Randomization
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
Ca : {t1: va,1, t2: va,2, …, tj: va,j, tj+1: va,j+1, …, tn: va,n} S
Cb : {t1: vb,1, t2: vb,2, …, tj: vb,j, tj+1: vb,j+1, …, tn: vb,n} S
Let W1 > W2 > … Wj > Wj+1 > … > Wn be the weights of the
attributes t1, t2, …, tj, tj+1, …, tn
• Cc : {t1: vb,1, t2: vb,2, …, tj: vb,j, tj+1: va,j+1, …, tn: va,n} S
• Cd : {t1: va,1, t2: va,2, …, tj: va,j, tj+1: vb,j+1, …, tn: vb,n} S
11
12. Proposed Approach
Verification and Validation Three Steps Validation
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
coherent
cases
stochastic validity >
validity threshold
store the case new iteration
stochastic validity <
validity threshold
12
13. Proposed Approach
Verification and Validation Coherence Verification
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
• Verification of the problem part
• Verification of the solution part
• Verification of the coherence of the case
13
14. Proposed Approach
Verification and Validation Stochastic Validation
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
Parameters to calculate the stochastic
validity:
• The ratios of the frequency of the case by
the frequency of the problem part
regardless of the solution
• How much is it similar to a fully valid case?
Are the solutions the same or not?
14
15. Proposed Approach
Verification and Validation Absolute Validation
Case-Based Reasoning
Retrieve
Reuse
Revise
Retain
Segmented
case-base
Knowledge amplification
using randomization
Validation Module
coherence verification
stochastic validation
absolute validation
• Cases that have a high stochastic validity
are verified using absolute validation.
• The absolute validation is fulfilled by the
expert.
15
16. Application Domain
Mammography Mass Attributes
BI-RADS
• From 1 and 5
Age
• From 0 to 100
Shape
• round
• oval
• lobular
• irregular.
Margin
• circumscribed
• micro-lobulate
• obscured
• ill-defined
• speculated
Density
• high
• iso
• low
• fat-
containing.
Severity
• benign
• malignant
16
17. Application Domain
Mammography Mass Formalization / Data
• UCI Machine Learning Repository (961 case, 688 non-
duplicated)
Data
•Form of the case
•Example
𝐵𝐼, 𝑎𝑔𝑒, 𝑠ℎ𝑎𝑝𝑒, 𝑚𝑎𝑟𝑔𝑖𝑛, 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 → {𝑠𝑒𝑣𝑒𝑟𝑖𝑡𝑦}
1 : :5, :67, : , : ,
:
{
}
C BI RADS age shape lobular margin speculated
density low malignant
-
®
17
20. Conclusion
Main contributions are:
(1) Randomization technique for amplification,
(2) Segmentation of the case-base,
(3)Validation prosses based on three layers,
(4) Identify the severity of a mammography mass
20
23. Randomization Approach in Case-
Based Reasoning: case of study of
mammography mass
Authors:
• Miled Basma BENTAIBA-LAGRID bm_bentaiba@esi.dz
• Dr. Lydia Bouzar-Benlabiod l_bouzar@esi.dz
• Prof. Stuart H. Rubin stuart.rubin@navy.mil
• Prof. Thouraya Bouabana-Tebibel t_tebibel@esi.dz
• Maria Roumeissa Hanini m_hanini@esi.dz
23
Editor's Notes
Humans can solve problems that they confront on daily basis. One of the methods of doing that is using previous experiences and adapt their solutions to the newly encountered problems.
CBR imitates the human thinking. It has a case-base where a case is referred to an experience captured from the real word.
It is clear that when we have a static and non-evolving case-base the problem resolution process won’t be efficient. On the other hand, a massive case-base may slow down the resolution process. This is the aim of using randomization process to amplify the case-base without affecting the resolution time. Don’t worry, I will explain what the randomization is in next slides.
First defined by (G. Chaitin 1975) it means that information or knowledge can be effectively compressed until that representation of the compressed information is random; or in other words, pattern-less.
Randomization was first used for knowledge amplification purposes by professor Tebibel. The randomization allows us to amplify the knowledge, and also keeping the case-base optimized.
Randomization is domain specific
A generic method was proposed (S. H. Rubin)
The formalization of the problem may vary from a domain to another
Randomization has been used for different purposes, not only for knowledge amplification. In our work, we are focusing on knowledge amplification using randomization
Solution part similarities
Totally similar if they have the same class of solution/ not similar if they have different solution parts
Problem part similarities
Depends on the values of the attributes
two types of attributes quantitative and qualitative
Before doing the process of randomization we must define what are the most important attributes that can change the solution if it will be a small change in it’s value.
So for each attribute, we compare each cases with each other and calculate the number of pairs where the solution is the same, and
If the attributes are highly similar and have the same solution, maybe it’s the attribute that made that solution happen. Similarly, if the attributes are slightly similar and have a different solution maybe it’s because of this attribute that made the solution different.
The opposite for X, even if the attributes are similar but the cases have a different solution so the attribute doesn’t have a big impact on the solution
Let have two cases Ca and Cb. The cases belong to the same the same level of a segment. If we consider that numbAttr = j then:
Ca : {t1: va,1, t2: va,2, …, tj: va,j, tj+1: va,j+1, …, tn: va,n} S
Cb : {t1: vb,1, t2: vb,2, …, tj: vb,j, tj+1: vb,j+1, …, tn: vb,n} S
Let W1 > W2 > … Wj > Wj+1 > … > Wn be the weights of the attributes t1, t2, …, tj, tj+1, …, tn
The transformation by interchanging the jst highest weighted attributes begets the two following newly generated cases:
Cc : {t1: vb,1, t2: vb,2, …, tj: vb,j, tj+1: va,j+1, …, tn: va,n} S
Cd : {t1: va,1, t2: va,2, …, tj: va,j, tj+1: vb,j+1, …, tn: vb,n} S
Red: without randomization
Yellow: with randomization without validation
Green: with randomization and stochastic validation
The proposed segmentation of the case-base helps the randomization process to be fulfilled. How the segmented case-base is used by the CBR? Define how Retrieve, Reuse, Revise & Retain are implemented.
How to improve absolute validation to be automatic or semi-automatic? rule generation
As for future work, first, we are planning to propose a more detailed way to perform the absolute validation using a semi-automatic process that generates rules from both valid and invalid cases. The aim of the future work is to reduce expert’s intervention and to speed up the validation process. Then, we will focus on how the CBR retrieves, reuse, revise and retain modules are implemented to benefit from the proposed structure of the case-base.