Sales, C. M. D., & Wakker, P. P. (2010). Combining metric and qualitative approach in a measure of similarity for ill-structured data sets. Paper presented at the XVII Meeting of the Portuguese Association of Data Classification and Analysis JOCLAD, Lisbon.
Combining metric and qualitative approach in a measure of similarity for ill-structured data sets
1. Sales, C. M. D., & Wakker, P. P. (2010). Combining metric and qualitative
approach in a measure of similarity for ill-structured data sets. Paper
presented at the XVII Meeting of the Portuguese Association of Data
Classification and Analysis JOCLAD, Lisbon.
Combining metric and qualitative approach
in a measure of similarity for ill-structured
data sets
Célia M. D. Sales & Peter P. Wakker
JOCLAD 2010, Lisboa
2. Metric-Frequency Similarity Measure (MF)
Sales, C.M.D. & Wakker, P. P. (2009). The metric-
frequency measure of similarity for ill-structured data
sets, with an application to family therapy. British
Journal of Mathematical and Statistical Psychology, 62,
663-682.
Software for calculation available:
http://people.few.eur.nl/wakker/miscella/mf.similarity.cal
culate/mfsimexplanation.htm
2 Sales & Wakker, JOCLAD 2010
3. The Team
Célia M. D. Sales
Dept. of Psychology and Sociology, Universidade Autónoma de
Lisboa (CIP/UAL)
Center for Social Research and Intervention (CIS-ICSTE/IUL)
Peter P. Wakker
Dept. of Economics, H13-27, Erasmus University, Rotterdam, The
Netherlands
Acknowledgements:
Angela Fragoeiro
Francisco Ortega Beviá
Sónia Noronha
3 Sales & Wakker, JOCLAD 2010
4. Outline
Why developing MF
MF rationale
MF formula explained
Illustration in a real case
Next steps
4 Sales & Wakker, JOCLAD 2010
5. The challenge
The development of the MF arose from a problem in
psychotherapy research
Individualized change measures
Self-report measures
Patient elaborates a list of problems and rates how much each
problem bothers him
Each questionnaire is unique as it varies in the number of items and in
their content
Example: Zarastro
5 Sales & Wakker, JOCLAD 2010
6. Mother’s complaints
1. There were death threats to the mother.
2. I had to leave the house many times.
3. Family was very upset because of the constant threats to me.
4. He was 16 years without relating to the outside, broke up with
lifetime friends.
5. In the family we had continuous fights and arguments.
6. We called the police 5 times, in 2 years.
7. We went to a series of private psychiatrists.
8. He did very poorly in school.
9. We had short quiet times, going back to the threats, creating distress.
10. He was very strange and aggressive.
6 Sales & Wakker, JOCLAD 2010
7. Zarastro’s Complaints
1. I’m very shy.
2. I don’t know how to keep a conversation going.
3. It’s hard to maintain my few friendships.
4. It bothers me to have eye-contact with people on the street.
5. I feel that people are watching me.
6. I’m not able to show my dislike or distress.
7. I’m worried about not having a job or schooling/studies/education.
8. I have a cold relationship with my younger brother.
9. It’s hard to talk and show my complaints at home.
10. I’m obsessed with the past.
7 Sales & Wakker, JOCLAD 2010
8. His brother José
1. My brother Z. has a distorted perception of the relationships at home.
2. My brother Z. has difficulties relating to others.
3. My brother Z. has been losing his friends.
4. Lack of emotional communication.
5. My brother Amadeus is often impolite towards Zarastro
6. My brother Amadeus is very cold and reserved, doesn’t interact.
7. My brother Amadeus has lost interest in Zarastro’s problem.
8. My sister has a severe depression.
9. It affects my job because it decreases my attention.
10. It affects my relationships with others.
11. I feel down when I have to take care of Zarastro.
8 Sales & Wakker, JOCLAD 2010
9. His brother Amadeus
1. I feel anxious sitting at the table between Zarastro and my mother.
2. My mother makes her children depend on her.
3. My mother wants her cubs surrounding her.
4. Zarastro is watching too much T.V.
5. We (extended family) share the same fate.
9 Sales & Wakker, JOCLAD 2010
10. The challenge
To what extent members have a similar perception of the
problems in the family?
10 Sales & Wakker, JOCLAD 2010
11. ill-structured data set
All family members have a different questionnaire
corresponding to their personal view of the existing
problems;
There is no control over the content of the items that
each family member can raise;
There is no limit to the number of items that are
conceivable;
Each person is free to add new items or delete previous
items in subsequent questionnaire administration;
Each item is weighted in a Likert scale.
11 Sales & Wakker, JOCLAD 2010
12. MF: The rational
Similarity must consider:
numerical differences
presence or absence of features (number of items raised)
MF combines both metric and frequency components and
is targeted towards situations in which the number of
aspects is unpredictable.
12 Sales & Wakker, JOCLAD 2010
13. MF: The rational
Amos Tversky (1977)
Assessment of similarity between stimuli described as a
comparison of features, rather than by the computation of
metric distance between points that represent objects
Tversky's formula applies to similarity measurements where
items are either present or absent, but have no degrees of
intensity
MF is an extension to the case where also intensities have
been measured
The primary purpose of the MF is to be widely applicable
to handle situations that are ill structured and complex
13 Sales & Wakker, JOCLAD 2010
14. Imagine two members of a family (mother and
father)
Each indicated a number of "problems" (items),
j = number of (“joint”)
and scored how serious they feel these
items raised by both (items
problems are (1-7 scale)
A,B,C,D= 4);
Items not raised receive a score 0
f = number of items raised
by the father and not by the
mother (items E,F,G=3);
FATHER MOTHER
Items Scores Items Scores m = number of items raised
A 7 A 5 by the mother and not by
B 6 B 6 the father(item H=1)
C 1 C 2
D 1 D 1 The total number of items
E 3 H 1 is j + f + m (= 4 + 3 + 1 =
F 2 8).
G 2
1
14 Sales & Wakker, JOCLAD 2010
15. MF: The formula explained
Score Frequency MF overall
Similarity Similarity Similarity
15 Sales & Wakker, JOCLAD 2010
16. Stage 1: Score Similarity
Items Normalized scores Normalized scores |diff| Similarity
of the father of the mother 1-|diff|
A 1 5/7 2/7 5/7
B 6/7 6/7 0 7/7
C 1/7 2/7 1/7 6/7
D 1/7 1/7 0 7/7
E 3/7 0 3/7 4/7
F 2/7 0 2/7 5/7
G 2/7 0 2/7 5/7
H 0 1/7 1/7 6/7
+
45/7
1
(1 | diff |)
j f m
16 Sales & Wakker, JOCLAD 2010
17. Score Frequency MF overall
Similarity Similarity Similarity
(1 | diff |) 0-1 scaled similarity measure based only on
the average differences of the scores of the
j f m father and the mother.
17 Sales & Wakker, JOCLAD 2010
18. Stage 2: Frequency similarity
Frequency
similarity
Similarity based on the number of
items raised jointly by the father and
the mother
(Dis)similarity based on the difference of the
number of items raised by the father and the
mother
18 Sales & Wakker, JOCLAD 2010
19. Step 2.1 – Similarity based on the number
of items raised jointly
Reflected by a number j/N
N is a normalization factor that ensures that j/N, f/N, and
m/N never exceed 1.
N should be the same for all participants whose mutual
similarity weights are calculated
Thus, it should exceed the maximum number of items raised
by any single participant in the group considered. For instance,
it can be the maximum number of conceivable items.
In our example, N = 20 has been chosen, so that j/N = 4/20 =
0.2.
19 Sales & Wakker, JOCLAD 2010
20. Step 2.1 – Similarity on the number of items
raised jointly
Instead of the number j/N, we will use a transformation
j/N.
The transformation is curved downwards (concave):
similarity increases less for high values of j (and j/N) than
for low values.
Thus, an increase from j=0 to j=1 has more impact on
mother and father similarity than an increase from j=17
to j=18, which is plausible.
In our example, the transformation yields
0.2 = 0.45
20 Sales & Wakker, JOCLAD 2010
21. Step 2.2 – (Dis)similarity on the difference of
the number of items
FATHER MOTHER
f = number of items raised by
Items Scores Items Scores
the father and not by the
A 7 A 5
mother (items E,F,G=3)
B 6 B 6
C 1 C 2
D 1 D 1 m = number of items raised
E 3 H 1 by the mother and not by the
F 2 father (item H=1)
G 2
1
|fm|
| f/N- m/N|
1 - | f/N- m/N|
21 Sales & Wakker, JOCLAD 2010
22. Frequency MF overall
Score Similarity
Similarity Similarity
(1 | diff |)
j f m
( j / N ) 1 | ( f / N ) (m / N ) |
2
22 Sales & Wakker, JOCLAD 2010
23. Stage 3: MF overall similarity
(1 | diff |)
½ + ¼ + ¼(j/N) ¼|(f/N)(m/N)|
j f m
The MF measure results as the half-half midpoint of the score-
similarity and the frequency-similarity
23 Sales & Wakker, JOCLAD 2010
24. Pre-Treatment
Zarastro Mother José
Mother w: 0.45
stress formula 1 = 0.0328; stress formula 2 = 0.1054; r(monotonic)
f: 0.58
squared=0.9889; r-squared (p.v.a.f.)=0.9319)
0.51
José w: 0.33 w: 0.33 ---------------------- ZARASTRO
f: 0.55 f: 0.52 -------------------------|
0.44 0.43 | ------------ MOTHER
Amadeus w: 0.29 w: 0.37 w: |
f: 0.49 f: 0.49 0.33 |--------------------------------------------------- JOSÉ
0.39 0.43 f: 0.45 |
0.39 -------------------------------------------------------------------------
AMADEUS
Post-Treatment
Zarastro Mother José
Mother w: 0.81
stress formula 1 = 0.0000; stress formula 2 = 0.0000; r(monotonic)
f: 0.55
squared=1.0000; r-squared (p.v.a.f.)=0.9944)
0.68
(0.17)
José w: 0.81 w: 0.75
f: 0.54 f: 0.58 ---- ZARASTRO
0.68 0.67 ------------|
(0.24) (0.24) | -------------------- JOSÉ
|
Amadeus w: 0.71 w: 0.72 w:
|--- MOTHER
f: 0.49 f: 0.50 0.70 |
0.60 0.61 f: 0.44 -------------------------------------------------------------------------
(0.21) (0.18) 0.57 AMADEUS
(0.18)
1
w: the score-similarity; f: the frequency-similarity; printed in bold: the overall similarity; within
parentheses is the pre-post change, given by the difference in overall similarity of those two times.
24 Sales & Wakker, JOCLAD 2010
25. Conclusion
The MF is pragmatic and easily applicable to data sets
with little structure
In particular it need not be anticipated which variables
will be observed, or how many variables, and they may be
metric or qualitative
25 Sales & Wakker, JOCLAD 2010
26. Next Steps
New software for MF calculation, available on-line
Applying MF in psychotherapy research:
Comparing alternative data entering (categorized vs. raw items)
Implementation the MF in a software for patient progress
(Family Therapy and Group Therapy)
Applying MF in other fields:
Comparing agreements in open-ended judgments
26 Sales & Wakker, JOCLAD 2010