Journal of Autism and Developmental Disorders. Vol. 29. No. 1. 1999The TOM Test: A New Instrument For Assessing Theory ofMind in Normal Children and Children with PervasiveDevelopmental DisordersPeter Muris,1,4 Pim Steerneman,2 Cor Meesters,3 Harald Merckelbach,1Robert Horselenberg, Tanja van den Hogen,3 and Lieke van Dongen3 This article describes a fust attempt to investigate the reliability and validity of the TOM test, a new instrument for assessing theory of mind ability in normal children and children with pervasive developmental disorders (PDDs). In Study 1, TOM test scores of normal children (n = 70) cor- related positively with their performance on other theory of mind tasks. Furthermore, young chil- dren only succeeded on TOM items that tap the basic domains of theory of mind (e.g., emotion recognition), whereas older children also passed items that measure the more mature areas of theory of mind (e.g., understanding of humor, understanding of second-order beliefs). Taken together, the findings of Study 1 suggest that the TOM test is a valid measure. Study 2 showed for a separate sample of normal children (n = 12) that the TOM test possesses sufficient test-retest stability. Study 3 demonstrated for a sample of children with PDDs (n = 10) that the interrater reliability of the TOM test is good. Study 4 found that children with PDDs (n = 20) had significantly lower TOM test scores than children with other psychiatric disorders (e.g., children with Attention-deficit Hyperactivity Disorder; n = 32), a finding that underlines the discriminant validity of the TOM test. Furthermore, Study 4 showed that intelligence as indexed by the Wechsler Intelligence Scale for Children was positively associated with TOM test scores. Finally, in all studies, the TOM test was found to be reliable in terms of internal consistency. Altogether, results indicate that the TOM test is a reliable and valid instrument that can be employed to measure various aspects of theory of mind. KEY WORDS: Theory of mind; pervasive developmental disorders; reliability.INTRODUCTION interest. Research in this area is described under the gen- eral heading "theory of mind." Premack and Woodruff (1978) were the first to use the term to refer to the Recently, childrens understanding of their own and childs ability to ascribe thoughts, feelings, ideas, andothers mental states has been the focus of considerable intentions to others and to employ this ability to antici- pate the behavior of others. According to Wellman1Department of Psychology, University of Limburg, P.O. Box 616. 6200 MD Maastricht, The Netherlands. (1990), theory of mind is a prerequisite for the under-2 South-Limburg Centre of Autism, c/o RIAGG-OZL, P.O. Box 165. standing of the social environment and for engaging in 6400 AD Heerlen, The Netherlands. socially competent behavior (see also Astington & Jen-3 Department of Experimental Abnormal Psychology, University of kins, 1995). Limburg. P.O. Box 616. 6200 MD Maastricht, The Netherlands. It has been proposed that autistic children are so-4 Address all correspondence to Peter Muris. Department of Psychol- ogy, University of Limburg, P.O. Box 616. 6200 MD Maastricht, cially impaired precisely because they lack a theory of The Netherlands. mind (Frith, 1989). In a series of studies, Baron-Cohen, 67 0162-3257/99/ 0200-0067$16.00/0 C 1999 Plenum Publishing Corporation
68 Muris et al. Leslie, and Frith (1985, 1986) demonstrated that the children give an answer in this sense, they are shown ability of autistic children to attribute mental states to that the box actually contains a pencil. Next, children others is seriously impaired. These researchers found are told that another child will be asked what is in thethat about 80% of the autistic children were unable to box. They are then asked the crucial question: "Whatcorrectly predict the ideas of others, whereas most men- do you think the other child will say?" From their an-tally retarded and normal controls of lower mental age swer on this question, one can infer whether children arewere able to do so. able to make a judgment about another persons false Specific programs have been developed to train the- expectation. That is, an understanding of another indi-ory of mind skills in autistic children. For example, in viduals false belief—and presence of theory of mind—a study by Ozonoff and Miller (1995), five autistic chil- is demonstrated if children predict that another persondren received a training program in which they were not will think that there are Smarties in the box. Conceptualonly taught specific interactional and conversational difficulty with false belief attribution—and absence ofskills but also received explicit and systematic instruc- theory of mind—is revealed if children assume that an-tion regarding the underlying social-cognitive principles other person will think that there is a pencil in the box.necessary to infer the mental states of others (i.e., theory Several authors have argued that theory of mind isof mind). Pre- and posttreatment assessment demon- more than just the comprehension of false belief. Forstrated that the trained children improved on a number example, Perner and Wimmer (1985) have described twoof false belief tasks compared to control children who other types of belief that play a crucial role in childrenshad received no treatment. Similar positive results were understanding of social interactions: first-order beliefsobtained by Swettenham (1996), Hadwin, Baron-Cohen, that refer to what children think about real events (e.g.,Howlin, and Hill (1996), Bowler, Strom, and Urquhart "Michael thinks that Sophie is angry") and second-or-(1993), and Whiten, Irving, and Macintyre (1993). All der beliefs that pertain to what children think about otherthese studies were successful in that autistic children peoples thoughts (e.g., "Michael thinks that Sophiewho had received training were able to pass theory of thinks that hes angry with her").mind tasks. Furthermore, in a recent study of Steerne- Flavell, Miller, and Miller (1993) argue that chil-man, Jackson, Pelzer, and Muris (1996), socially im- dren develop a theory of mind along five successivemature (but not autistic) children were given a social stages. During the first stage, children adopt the conceptskills intervention program that incorporated theory of of mind, that is, they attribute needs, emotions, and othermind principles. Results showed that this type of training mental states to people and use cognitive terms such asproduced positive effects on theory of mind tests. Yet, "know," "remember," and "think." During the secondit should be added that the treatment effects found in stage, children acknowledge that the mind has connec-these studies do not always generalize to nonexperimen- tions to the physical world. More specifically, they un-tal settings or to tasks in domains where children re- derstand that certain stimuli lead to certain mental states,ceived no teaching (see, for a discussion of this issue, that these mental states lead to behavior, and that mentalSlaugther & Gopnik, 1996). states can be inferred from stimulus-behavior links. Dur- Given the availability of reasonably successful ing the third stage, children recognize that the mind istreatment programs, theory of mind assessment instru- separate from and differs from the physical world. Forments are important for two reasons. First, such instru- example, they realize that a person can think about anments can be used to identify those children who display object even though the object is not physically present.deficits in theory of mind. Second, such instruments can During the fourth stage, children learn that the mind canbe employed to evaluate the efficacy of theory of mind represent objects and events accurately or inaccurately.training programs. Thus, a representation can be false with respect to a real The assessment of theory of mind in children has object or event (e.g., in a false belief task), behavior canbeen predominantly confined to so-called "false belief be false with respect to a mental state (e.g., when a sadtasks. Such tasks intend to test childrens comprehension person smiles), and two peoples perceptual views orof another persons wrong belief. An example is the so- beliefs can differ (i.e., perspective taking). During thecalled Smarties test (e.g., Hogrefe, Wimrner, & Pemer, fifth and final stage, children learn to understand that the 1986). During this test, children are presented with a mind actively mediates the interpretation of reality. ForSmarties box and asked what it contains. Children are instance, children recognize that prior experiences affecthighly familiar with these boxes and know that they usu- current mental states which in turn affect emotions andally contain Smarties, a desirable chocolate candy. When social inferences. According to Flavell et al. (1993)
The TOM Test 69Stages 1-3 can best be regarded as theory of mind pre- lidity of the TOM test. More specifically, its relationshipcursors. These authors assume that these stages "prob- with other, more traditional, indices of theory of mindably emerge in quick succession, for they are very and social development was examined.closely related concepts having to do with the differen-tiation of, and relations between, the mind and the ex- Materials and Methodternal world" (p. 101). The step from Stage 3 to 4, the Subjects and Procedureemergence of a "real" theory of mind, probably comesmore slowly (around the age of 6); Stage 5, the "more Seventy children (46 boys and 24 girls) recruitedmature" theory of mind, would emerge still later. from a regular primary school (De Driesprong in Ge- Taken together, theory of mind refers to the childs leen, the Netherlands) participated in the study. The chil-capacity to analyze the behavior of others by recognizing dren ranged in age from 5 to 12 years. Ten children ofthe mental states (i.e., desires and beliefs) that underlie each age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years)intentional behavior. Thus, theory of mind is a complex, were selected. All children were healthy, socially well-developmental phenomenon, which implies certainly functioning, and none had learning difficulties. Thus, itmore than just the understanding of false belief. Obvi- can be assumed that they had normal intelligence. Chil-ously, there is a need for assessment tools that measure dren were tested at school in a private room with onlythe developmental progression of theory of mind in a the experimenter present. The assessment took place inbroader age range. One promising candidate in this re- two sessions. In one session, children underwent thespect is the Theory-of-Mind test (TOM test) designed by TOM test. In another session, a series of alternative the-Steerneman (1994). The TOM test contains a variety of ory of mind or social development tasks was adminis-items that can be allocated to three subscales which cor- tered. The order of the sessions was counterbalancedrespond with the three main theory of mind stages as within each age level group (i.e. half of the childrenproposed by Flavell et al. (1993): (a) precursors of the- started with the TOM test, while the other half first re-ory of mind (e.g., emotion recognition), (b) first mani- ceived the alternative battery of tests).festations of a real theory of mind (e.g., understanding The New Theory of Mind Testof false belief), and (c) mature aspects of theory of mind(e.g., second-order beliefs). As a practical tool, the test The TOM test comprises an interview that can beprovides information about the extent to which a child used in children between 5 and 12 years of age. Thepossesses social understanding, insight and sensibility, TOM test consists of vignettes, stories, and drawingsand the extent to which he or she takes the feelings and about which the child has to answer a number of ques-thoughts of others into account. The present article is tions. The test lasts about 35 minutes and contains 78concerned with the reliability and validity of the TOM items (i.e., questions). The TOM test contains threetest. subscales: (a) precursors of theory of mind (i.e., TOM 1; 29 items; e.g., recognition of emotions, pretense), (b) first manifestations of a real theory of mind (i.e., TOMSTUDY 1 2; 33 items; e.g., first-order belief, understanding of false belief), and (c) more advanced aspects of theory of mind The purpose of Study 1 was twofold. First, the con- (i.e., TOM 3; 16 items; e.g., second-order belief, under-struct validity of the TOM test was investigated. The standing of humor). In the Appendix, examples of itemsTOM test intends to be a developmental scale. There- of the three subscales are shown. Each TOM test itemfore, it was anticipated that TOM test scores correlate is scored as either failed (0) or passed (1). Accordingly,positively with age. That is, as children grow older, their total TOM scores range between 0 and 78, with highertheory of mind develops, and hence they pass more scores indicating a more mature theory of mind. TOMTOM test items. Furthermore, one expects that younger 1, TOM 2, and TOM 3 subscale scores vary between 0children predominantly succeed on TOM items that tap and 29, 0 and 33, and 0 and 16, respectively.the basic domains of theory of mind (e.g., emotion rec-ognition), whereas older children should increasingly Alternative, More Traditional, Indices of Theory ofpass items that measure the more mature aspects of the- Mind and Social Developmentory of mind (e.g., understanding of false belief, under- A number of alternative indices of theory of mindstanding of humor, second-order belief). A second and social development were employed in the currentpurpose of Study 1 was to evaluate the concurrent va- study.
70 Muris et al. The Sally and Anne test (see Baron-Cohen et al., others, level 0); subjective role taking (i.e., the child rec- 1985) is a false belief task. It consists of a comic-strip ognizes his own point of view and that of others, level story in which Sally and Anne are first introduced: Sally 1); self-reflective role taking (i.e., the child is able to with a basket in front of her and Anne with a box. Next, adopt another persons perspective, level 2); and recip- Sally is shown placing a ball in the basket and leaving rocal role taking (i.e., the child weights his perspective the room. Anne is then shown taking the ball from the against that of others and finds a solution for the social basket and placing it in the box. Following this, Sally dilemma, level 3).returns and children are asked: "Where will Sally look The John and Mary test (Perner & Wimmer, 1985) for her ball?" If the children point to the previous lo- assesses childrens understanding of second-order be-cation of the ball, they pass the task because they ac- liefs. The test is an acted story in which two charactersknowledge Sallys false belief (score = 1). If, however, (John and Mary) are independently informed about anthey point to the balls current location, they fail the task objects (an ice cream van) unexpected transfer to a newby not taking into account Sallys false belief (score = location. Hence both John and Mary know where the0). van is but there is a mistake in Johns second-order be- The Smarties test (Hogrefe et al., 1986) was used lief about Marys belief. "John thinks that Mary thinksas an alternative false belief task (see Introduction). that the van is still at the old place." Childrens under-Scores on this test also vary between 0 (failed) and I standing of this second-order belief was tested by ask-(passed). ing: Where does John think Mary will go for the ice Two tests of emotion recognition (Spence, 1980), cream? Scores on this test are either 0 (failed) or 1the "Test of perception of emotion from facial expres- (passed).sion" and the "Test of perception of emotion from pos-ture cues" were administered. Children were asked toidentify four basic emotions (happiness, fear, anger, and RESULTS AND DISCUSSIONsadness) on pictures showing facial expressions or bod-ily postures. Scores on each test range between 0 and 4. General Results The Social Interpretation Test (SIT; Vijftigschild,Berger, & Spaendonck, 1969) examines the childs abil- Reliability of the TOM Testity to interpret social situations adequately. The test con-sists of a colored picture depicting a street in which a The internal consistency of the TOM test was sat-number of events take place. The child has to answer 9 isfactory, that is, Cronbachs alphas were .92 for thequestions about the picture (e.g., What has happened total TOM-scale, .84 for TOM 1, .86 for TOM 2, andhere?, Why is the ambulance driving in the street?). .85 for TOM 3.The answers are registered, and classified into 24 cate-gories. For each category, 1 point is given. SIT test Age and Theory of Mindscores range between 0 and 24 with higher scores re-flecting greater ability to interpret social situations. Table 1 (right column) presents Pearson product- The Picture Arrangement subtest of the Wechsler moment and point-biserial correlations between age, onIntelligence Scale for Children-Revised (WISC-R; the one hand, and theory of mind measures, on the otherWechsler. 1974) was used as a measure of social sen- hand. As can be seen from this table, except for thesibility. This subtest asks children to order 12 series of Smarties test, all measures were positively and signifi-4 pictures in such a way that each series of pictures cantly associated with age. The absence of a connectiondepicts a sensible story (range 0-12). between age and Smarties test performance is due to the The Role Taking test (Selman & Byrne. 1974) taps fact that nearly all children in the present study, evenrole taking skills of children. The test comprises a story the 5- to 6-year-olds, passed mis test.of a social dilemma (a young girl has to save a little cat As expected, there was a robust correlation between from a high tree, although she has just promised her TOM test and age: r(70) = .80, p < .001. Inspection of father not to climb in trees anymore). Children are ques- mean TOM scores per age level (see Table 1) showedtioned about this story. From their answers on these that theory of mind capability increased linearly as chil-questions, one can derive the level of role taking: ego- dren grew older. This indicates that the TOM test has centric role taking (i.e.. the child is not able to differ- one crucial property of a developmental scale, namely, entiate between his/her own point of view and that of it is sensitive to maturation. With respect to this result,
The TOM Test 71 Table I. Mean Scores of Children on Theory of Mind and Social Development Measures for Different Age Levels, and Pearson Product-Moment and Point-Biserial Correlations Between Age and Various Measures Age (in years) 5-6 7-8 9-10 11-12 Measure M SD M SD M SD M SD r with age TOM test 42.5 7.4 59.3 6.9 63.9 5.2 68.1 4.8 .80° Emotion recognition-face 3.1 0.9 3.4 0.7 3.9 0.3 3.9 0.3 .50" Emotion recognition-posture 2.4 1.1 2.7 1.2 3.4 0.9 3.7 0.7 .46° Sally and Anne test 0.4 0.5 0.7 0.5 0.9 0.2 0.8 0.4 .48° Smarties test 0.9 0.3 0.9 0.2 1.0 0.0 1.0 0.0 .25 Social Interpretation test 7.2 3.0 8.8 2.6 13.5 2.8 14.7 2.4 .74° WISC-R picture arrangement 3.2 3.0 8.3 2.0 9.7 1.6 9.4 1.2 .72° Role taking test 0.5 0.6 1.6 0.8 2.0 0.6 2.3 0.7 .73° John and Mary test 0.4 0.5 0.9 0.3 0.9 0.3 0.9 0.3 .44° "p < .05/9 (i.e., Bonferroni correction).two further remarks are in order. To begin with, it should age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years) successbe noted that the most pronounced increase in theory of percentages of the three TOM subscales were calculatedmind took place between ages 6 and 7. This is in line (i.e., number of passed items on a subscale divided bywith the findings of previous studies showing that chil- the total number of items of that subscale). Figure 1dren of that age display marked improvement in their shows mean success percentages on the three TOMperformance on more complicated theory of mind tasks subscales for the various age levels. A 3 (Subscales) X(e.g., Perner & Wimmer, 1985). Second, the TOM test 7 (Age Levels) multivariate analysis of variance per-also proved suitable to index differential development of formed on these data revealed a significant effect of age,theory of mind in older age groups (i.e., in 9-10- and F(6, 63) = 32.1, p < .001, indicating that TOM test 11-12-year-old children). Note that a number of the al- performance improves with age. Furthermore, a signifi-ternative tasks tap an aspect of theory of mind that most cant effect of subscale, Fhot(2, 62) = 133.2, p < .001,normal children master at a relatively early age. For ex- emerged due to the fact that success percentages onample, from age 7 onwards about 90% of the children TOM 1 (i.e., precursors of theory of mind) were highersuccessfully pass the John and Mary test, whereas from than those on TOM 3 (i.e., mature theory of mind),age 8 onwards most children recognize the four basic whereas success percentages of TOM 2 (i.e., first man-emotions from facial expression (see Table I). This in- ifestations of a real theory of mind) were in between.dicates that these tests are less sensitive to index differ- Finally, the interaction of subscale with age also reachedential development of theory of mind in older age significance, Fhot(l2, 122) = 2.3, p < .05. As can begroups. seen, 7-year-old children succeeded on the vast majority of TOM 1 and TOM 2 items (>80%), indicating that most of these children have passed the first two stagesConstruct Validity of the TOM Test of theory of mind development. Note also that the mean success percentage on TOM 3 items in 5-year-old chil- As the TOM test intends to measure three succes- dren was only 23.8%, whereas in 11- to 12-year-old chil-sive developmental stages of childrens theory of mind dren a success percentage of more than 80% is reached.(i.e., precursors of theory of mind, first manifestations Thus, as expected, children acquire advanced aspects ofof a real theory of mind, mature theory of mind), one theory of mind at a relatively later age (i.e., after theywould expect that young children predominantly succeed have learned the more basic principles of theory ofon items that index the precursors of theory of mind, mind).while at the same time they fail to pass items that mea-sure the more mature aspects of theory of mind. For Concurrent Validity of the TOM Testolder ages, one would predict that an increasing numberof children succeed on items that tap the more advanced The relationships between TOM test and alternativeareas of theory of mind. To examine this issue, for each indices of theory of mind were studied by means of
72 Muris et al. Fig. 1. Mean success percentages on the three TOM subscales calculated per age level Table II. Pearson Product-Moment and Point-Biserial Correlations Between TOM Test and Alternative Theory of Mind and Social Development Measures Variable TOM 1 2 3 4 5 6 7 TOMa 1. Emotion recognition-face .55b .34 2 Emotion recognition-posture .46b — .27 — .30 3. Sally and Anne test .50b .42* .30 — .17 4. Smarties test .37b .45* .30 .16 — .29 5, Social Interpretation Test .61b .38* .48b ,29 .10 .22 — 6, WISC-R picture arrangement .77b .45" .44b .49b .27 .55b — .30 f. Role taking test .75b .55" .40" .40b .27 .57b .63* — .40 8. John and Mary test .55" .44b .23 .45b .20 .29 .54b .54b .18 - To control for age effects. Pearson and point-biserial correlations were computed for each age level and then averaged. Mean correlations thus obtained are shown in this column. p < .05/36 (i.e.. Bonferroni correction).Pearson product-moment correlations. In cases where di- correlations). However, by selecting 10 children of eachchotomous variables were involved, point-biserial cor- age level, the design of Study 1 capitalized on the de-relations were used. As can be seen in Table II, most velopmental progression of theory of mind. Thus, con-theory of mind indices are significantly correlated with trolling for age would imply the elimination of aneach other. intrinsically important factor in both TOM and alterna- At first sight, it seems appropriate to compute cor- tive tests (i.e., the developmental progression of theoryrelations between TOM test and alternative indices of of mind). To circumvent this problem, Pearson andtheory of mind while controlling for age (i.e., partial point-biserial correlations between TOM test and con-
The TOM Test 73current measures were computed for each age level Table III. Demographic Variables of Normal Children in Study 2,separately. The mean of these separate correlations are and Their Total TOM Test Scores on Both Occasionspresented in the right column of Table II. As can be TOM test scoresseen, correlations attenuated considerably. Nevertheless, (8 weeks apart)the TOM test was still positively associated with con- Child Sex Age Occasion 1 Occasion 2current theory of mind indices. This result suggests that, 1 M 5 40 41as intended, the TOM test covers a broad range of theory 2 M 6 46 48of mind aspects. 3 M 6 46 54 4 F 7 41 45 5 M 8 56 56STUDY 2 6 F 8 62 67 7 M 9 62 65 8 M 9 63 68 Study 2 intended to investigate another aspect of 9 M 10 66 67the reliability of the TOM test, namely, its test-retest 10 F 11 65 71stability. To examine this issue, 12 normal primary 11 M 11 73 74school children were tested twice with the TOM test, 8 12 F 12 71 77weeks apart. M 60.5 64.4 SD 10.7 10.4MethodSubjects and Procedure ficients were .99 (p < .001) for the total score, .80 (p < .005) for TOM 1, .98 (p < .001) for TOM 2, and .91 Twelve children (8 boys and 4 girls) varying in age (p < .001) for TOM 3. These results indicate that thebetween 5 and 12 years from a regular primary school TOM test has sufficient test-retest stability and that the(De Pater van de Geld in Waalwijk, the Netherlands) test can be used to measure childrens development orparticipated in the study. AH children were healthy, nor- improvement in theory of mind capability.mal-functioning children. Children were interviewedwith the TOM test twice, 8 weeks apart. Both interviewswere conducted by the same experimenter in a separate STUDY 3room at school. The results presented so far suggest that the TOMResults and Discussion test can be used as a measure of the efficacy of theory of mind training programs in children with pervasiveInternal Consistency developmental disorders (PDDs). Yet, as the TOM test is based on an interview with the child, data about the Internal consistency of the TOM test appeared to interrater reliability are needed. Study 3 addressed thisbe sufficient: Cronbachs alphas were .95 for the total issue. Ten children with PDDs were tested with thescore, .62 for TOM 1, .94 for TOM 2, and .77 for TOM TOM test. Two independent observers classified the re-3. actions of the children to each TOM test item as either failed or passed.Test-Retest Reliability Table III shows demographic variables (age and Methodsex) of the children as well as their total TOM test scoreson both occasions. As can be seen. TOM test scores Subjects and Procedureincreased with age; the Pearson correlation was .88 (p< .001). Note further that most children slightly im- Ten children (10 boys) with PDDs were randomlyproved their score on Occasion 2. A paired t test showed selected for the purpose of the present study. Age of thethat this improvement was significant. t(l 1) = 5.4. p < children ranged between 7 and 13 years. All children.01. Most important, test-retest reliability for the TOM were treated in one of the AUTI-groups of the Pediatrictest was satisfactory; intraclass correlation (ICC) coef- Center Overbunde, Maastricht, The Netherlands. After
74 Muris et al. Table IV. Demographic Characteristics of 10 Boys and TOM Test Scores as Obtained by both Observers TOM test score Child Age (years; months) DSM-III-R diagnosis- IQb Observer 1 Observer 2 Kappac 1 13:3 PDDNOS 92 75 75 1.00 2 12:9 PDDNOS 93 70 70 1.00 3 10:11 AD 82 44 4S 0.87 4 7;6 AD 86 32 33 0.98 5 8:1 PDDNOS 93 61 59 0.97 6 11:2 PDDNOS 119 71 71 1.00 7 10;8 PDDNOS 92 60 59 0,96 8 12;3 PDDNOS 97 69 68 0.90 9 6.9 PDDNOS 96 35 33 0.90 10 7:10 PDDNOS 92 40 38 0.95 a PDDNOS = pervasive developmental disorder not otherwise specified; AD = autistic disorder. b As indexed by the WISC-R, c Interrater reliability (Cohens kappa).extensive psychodiagnostic and psychiatric screening, affected by the level of theory of mind development ofthe children were assigned a diagnosis of Autistic Dis- each child. As can be seen in the right panel of Tableorder or Pervasive Developmental Disorder Not Other- IV, the kappa values were high (i.e., all exceeded .87).wise Specified (PDDNOS). The children fulfilled the Furthermore, both observers produced a highly similarrelevant DSM-III-R criteria (American Psychiatric As- rank order of the children with regard to theory of mind;sociation, 1987). Diagnoses were made by a specialized, Spearman rank correlation was .99, p < .001.multidisciplinary team of professionals of the Center of Altogether, the results of Study 3 indicate that theAutism South-Limburg. The main demographic charac- interrater reliability of the TOM test is good.teristics of the children are shown in Table IV. Children were tested in a silent room with two ex-perimenters present. Five children were tested by Ex- STUDY 4perimenter 1, while Experimenter 2 observed from adistance. For the other five children. Experimenter 2 ad- Study 4 examined the discriminant validity of theministered the TOM test, while Experimenter 1 ob- TOM test. Various studies have concluded that a sub-served. Both experimenters monitored the responses and stantial proportion of the children with PDDs exhibit def-reactions of the children on-line. They were not able to icits in theory of mind. In most of these studies, theoryobserve each others registrations. of mind deficits have been demonstrated by means of false belief tasks (Baron-Cohen et al., 1985; EisenmajerResults and Discussion & Prior, 1991; Leslie & Frith, 1988; Perner, Frith, Les- lie, & Leekam, 1989; Prior, Dahlstrom, & Squires,Internal Consistency 1990). To investigate whether the TOM test is able to detect this specific deficit in children with PDDs, Study Internal consistency of the TOM test was good; 4 compared TOM test scores of children with autism andCronbachs alphas were .98 for the total score, .95 for PDDNOS with those of children who suffered fromTOM 1, .97 for TOM 2, and .95 for TOM 3. other psychiatric disorders (i.e., Attention-defi- cit/Hyperactivity Disorder, Anxiety Disorder).Interrater Reliability There is evidence to suggest that intelligence is a moderator variable in performance on theory of mind Interrater reliability of the TOM test was examined tests (see, for a review, Happe, 1995), For example,by computing Cohens kappa using scores of both ob- Happe (1994) investigated the WISC-R scores of autisticservers for the 78 items of the test. Kappas were cal- children who either passed or failed a false belief task.culated for each child separately because this makes it Her results showed that passers had significantly higheris possible to evaluate whether interrater reliability is IQ scores than failers. Most researchers in this domain
The TOM Test 75Table V. Demographic Characteristics and Mean TOM Test Scores for Children with Attention-deficit/Hyperactivity Disorder (ADHD), Children with an Anxiety Disorder (AnxD), and Children with a Pervasive Developmental Disorder (PDD) ADHD children AnxD children PDD childrenVariablea (n = 14) (n = 18) (n = 20) F or X2 P Post hoc comparisonsAge 8.5 (0.9) 9.1 (1.9) 9.3 (2.4) 0.7 nsSex (m/f) 12/2 11/7 17/3 3.8 nsTIQ 86.9(7.1) 93.6 (12.7) 85.4 (12.9) 2.6 <10 PDD<AnxDVIQ 91.6 (12.0) 90.5(11.9) 84.3 (16.1) 1.5 nsPIQ 83.4 (9.1) 97.4 (14.3) 86.6 (10.9) 6.6 <.005 PDD<AnxD; ADHD<AnxDTOM 61.1 (8.4) 58.9 (9.9) 39.1 (24.9) 9.2 <.00l PDD<AnxD; PDD<ADHDTOM1 23.5 (3.2) 23.1 (3.1) 16.9 (8.6) 7.2 <.005 PDD<AnxD; PDD<ADHDTOM 2 27.5 (3.8) 26.7 (4.5) 16.8(11.3) 10.9 <.001 PDD<AnxD; PDD<ADHDTOM 3 9.5 (22) 8.5 (3.2) 4.9 (5.4) 6.4 <.005 PDD<AnxD; PDD<ADHD m = male; f = female; TOM = TOM total score: TOM 1 = precursors of theory of mind; TOM 2 = first manifestations of the real theory of mind; TOM 3 = mature theory of mind. Levels of intelligence were measured with the WISC-R.assume that it is especially verbal IQ that plays a role PDDNOS) also participated in Study 4. These childrenin the performance on false belief tasks (Happe, 1995). were chosen randomly from the database of the CenterThis may be relevant for the TOM test, as this test is of Autism South-Limburg (see Study 3) and then inter-essentially an interview instrument. Thus, it may well be viewed with the TOM test. WISC-R scores of the PDDthe case that childrens scores on this test are critically children were also available. Demographic characteris-dependent on their verbal ability (i.e., language compre- tics (i.e., age, sex distribution, and IQ scores) of the threehension and/or expression ability). To examine this is- groups are shown in the upper part of Table V.sue, WISC-R scores of the children in Study 4 were alsoobtained. Results and DiscussionMethod Internal Consistency As in the previous studies, the internal consistencySubjects and Procedure of the TOM test was satisfactory; Cronbachs alphas of The subjects of Study 4 consisted of three groups: the total scale and the various TOM subscales varieda group of anxiety-disordered children, a group of chil- between .87 and .96 for the total group, .95 and .98 fordren with Attention-deficit/Hyperactivity Disorder the children with PDD, and .72 and .80 for psychiatric(ADHD), and a group of children with pervasive devel- control children.opmental disorders. From the database (1996) of the children and youth Discriminant Validitysection of the Community Mental Health Center, EasternSouth-Limburg in Heerlen, The Netherlands, all children The lower part of Table V shows mean TOM testsuffering from ADHD (n = 14) or an anxiety disorder scores for the three groups. Analyses of variance fol-(AnxD, i.e., obsessive-compulsive disorder, overanxious lowed up by post-hoc t tests revealed that children withdisorder, specific phobia, posttraumatic stress disorder, PDD had significant lower TOM test scores than chil-and separation anxiety disorder; n = 18) were selected. dren with ADHD and AnxD.Children were classified on the basis of the DSM-III-R For this sample, the Pearson product-moment cor-after extensive psychodiagnostic and psychiatric screen- relation between TOM test and age was only .24 (p <ing. As part of the intake procedure, all children com- .10). Correlations between TOM test scores, on the onepleted the TOM test and the revised version of the hand, and Total IQ, Verbal IQ, and Performance IQ, onWechsler Intelligence Scale for Children (WISC-R; the other hand, however, were all positive and signifi-Wechsler, 1974). cant; r(52)s were .58 (p < .001), .61 (p < .001), and Twenty high-functioning children with PDDs (i.e., .45 (p < .001), respectively. Thus, children with higher8 children with Autistic Disorder and 12 children with intelligence scores performed better on the TOM test.
76 Muris et al. To examine the unique contribution of the diag- The current study was a first attempt to investigatenosis Pervasive Developmental Disorder to TOM test the reliability and validity of the TOM test. The mainperformance, two additional analyses were performed. results can be summarized as follows. To begin with,First of all, a multiple regression analysis (forward step- the TOM test was found to be a reliable instrument;wise) was earned out with Diagnosis Autism, Diagnosis internal consistency was good, test-retest reliability wasPDDNOS (both dummy variables), Verbal IQ, Perform- sufficient, and interrater reliability was high. Second,ance IQ, and Age as the predictors, and TOM test scores TOM test scores increased with age, indicating that thebeing the dependent variable. Results showed that Di- test is sensitive to developmental progression. In lineagnosis Autism entered on the first step r(52) = -.69, with this, young children only succeeded on TOM itemsp < 0.001; accounting for 47.6% of the TOM test that tap basic domains of theory of mind, whereas olderscores. Verbal IQ (partial r = .32, p < .01), Age (partial children also passed items that measure the more ad-r = .24, p < 0.05), and Diagnosis PDDNOS (partial r vanced areas of theory of mind. Third, evidence was= -0,23, p < .05) entered on the second, third, and obtained that supports the concurrent validity of thefourth step of the regression equation, accounting for TOM test. That is, TOM test scores correlated positivelysignificant proportions of the variance (10.2, 5.8, and and significantly with the performance on several other4.4%, respectively). Second, an additional multiple re- theory of mind tasks (i.e., tests of emotion recognition,gression analysis was performed while forcing Verbal understanding of false and second-order beliefs, and roleIQ, Performance IQ, and Age in the equation at Step 1. taking). Fourth and finally, children with a PDD per-Still, both Diagnosis Autism and Diagnosis PDDNOS formed worse on the test than children with other psy-contributed significantly to TOM test scores: partial rs chiatric disorders. This suggests that the TOM testbeing -.45 (p < .001) and -.22 (p < 0.05). Thus, even possesses discriminant validity.when controlling for IQ level and age, diagnoses still The TOM test can be used in three ways. First, thepredicted TOM test performance; the more severe chil- test can be employed to screen children for deficits indrens pervasive developmental disorder, the worse they theory of mind. There is some evidence to suggest thatperformed on the TOM test. a poorly developed theory of mind can have negative Altogether, the results of Study 4 support the dis- social-emotional consequences, even in normal childrencriminant validity of the TOM test in that children with (Lalonde & Chandler, 1995). Consequently, an instru-a PDD performed worse on the test than children with ment that measures the maturity of childrens theory ofother psychiatric disorders. Furthermore, the findings in- mind at different age levels is important. Second, be-dicate that this difference in TOM test performance is cause the TOM test is informative about the develop-not carried by differences in intelligence. Even when mental phase of childrens theory of mind, it enablescontrolling for intelligence, a significant and negative clinicians to tailor their intervention to specific problemsassociation between diagnoses of autism and PDDNOS, of each child. For example, when the TOM test indicateson the one hand, and TOM test performance, on the that a child even fails on items that measure precursorsother hand, emerged. of theory of mind, it would be futile to teach this child understanding of false beliefs. Third, the TOM test canGENERAL DISCUSSION be used to evaluate the efficacy of theory of mind train- ing programs. Theory of mind pertains to childrens capacity to Altogether, the present findings imply that the TOManalyze the behavior of others by recognizing mental test is a reliable and valid instrument that can be em-states (i.e., desires and beliefs) that underlie intentional ployed to screen the development of theory of mind inand social behavior. Clearly, then, theory of mind con- 5- to 12-years-old normal children, children with per-sists of various aspects, such as the recognition of emo- vasive developmental disorders, and other socially im-tions, the assessment of how others think, and the mature children.understanding of the motives underlying behavior ofothers. The TOM test has been construed to measure thisbroad range of aspects from a developmental perspec- APPENDIXtive. The test intends to tap three successive stages inthe development of theory of mind: precursors of theory Examples of TOM Test Itemsof mind, first manifestations of a real theory of mind, Each question represents a TOM test item which isand more advanced aspects of theory of mind. scored as either failed (0) or passed (1). The subscale to
The TOM Test 77 Fig. Al. Picture of Example 1. Fig. A2. Picture of Example 3.which each item belongs is mentioned between paren- Story: Pirn is one year old. Hes at home, playing on thetheses. ground Mother has given him a piece of apple. Sud- denly, Pim bites his lip and he starts to cry. He throws the piece of apple on the ground. Mother lifts Pim up,Example 1 comforts him, and puts the piece of apple on the table.Instruction: Take a look at this picture. When father arrives at home, mother is on the phone.Question 1: What has happened? Can you tell something Father lifts Pim up and hugs him. Then he puts Pim backabout it? (TOM 1) on the ground, and gives him the piece of apple whichQuestion 2: Who in this picture is afraid? (TOM I) is still lying on the table. As soon as Pim sees the pieceQuestion 3: Why is this person afraid? (TOM 2) of apple, he starts to cry.Question 4: Who in this picture is happy? (TOM 1) Question 1: Why is Pim crying when father gives himQuestion 5: Why is this person happy? (TOM 2) the piece of apple? (TOM 1)Question 6: Who in this picture is sad? (TOM 1) Question 2: Does father know why Pim is crying? (TOMQuestion 7: Why is this person sad? (TOM 2) 2)Question 8: Who in this picture is angry? (TOM 1) Question 3: Does father know that Pim has bitten his lipQuestion 9: Why is this person angry? (TOM 2) when he wanted to eat the apple? (TOM 2) Example 3Example 2 Instruction: Take a look at this picture.Instruction: I will read you a short story. Listen care- Question 1: What, do you think, is happening in thisfully. picture? (TOM 1)
78 Muris et al. Example 5 Instruction: Take a look at this picture. Story: This is Ben. Ben wants to play with his bricks. Question 1: Which box will Ben open to play with his bricks? (TOM 1) Story: Ben opens the box of bricks, and surprisingly he finds out that it is filled with washing powder! He closes the box, and opens the other smaller box. There are his bricks! He takes out some bricks, and goes playing with them in his bedroom. Then his brother Tim is entering the room. Tim also wants to play with the bricks... Question 2: Which box will Tim open to play with his bricks? (TOM 2) Question 3: Do you know where the bricks really are? (TOM 2) Example 6 Instruction: I will read you a short story. Listen care- fully. Story: Father and mother are at a birthday party. They only know a few people, and think the music is too loud. "Wow," says father, "Its a pleasure to be here!" Question 1: What does father mean? (TOM 3) Question 2: Why does father say: "Its a pleasure to be here!" (TOM 3) Fig. A3. Picture of Example 4. Example 7Story: The two boys in the foreground gossip about the Question: Do as if you comb your hair. (TOM 1)other boy. Suddenly, that boy approaches them and Question: Do as if you brush your teeth (TOM 1)hears what they are saying. The two boys are startled. Question: Do as if you are feeling cold, (TOM 1)Question 1: How does this boy feel? (point at the boy Question: How can I see that you are feeling cold?in the background) (TOM 1) (TOM 2)Question 2: How does this boy feel? (point at one of the Question: Do as if you have a nasty drink. (TOM 1)boys in the foreground) (TOM 1) Question: How can I see that your drink is nasty? (TOM 2) Question: Do as. if you are scared? (TOM 1) Question: How can I see that you are scared? (TOM 2)Example 4instruction: Take a look at this picture. Example 8Question 1: What has happened in this picture? (TOM1) Instruction: Take a look at this picture.Question 2: How do you feel when you hurt yourself? Story: This is John. John often dreams. Sometimes he(TOM 1) dreams about a new bike that he likes to have.Question 3: Can you see from the girls face how she Question 1: Is John able to touch the bike that he dreamsreally feels? (TOM 2) about? (TOM 1)Question 4: Is it possible to look happy, when you have Story: Sometimes John has a frightening dream. Thenhurt yourself? (TOM 2) he dreams about shadows.
The TOM Test 79 Fig. A4. Pictures of Example 5.Question 2: Does John really see these shadows with his REFERENCESeyes? (TOM 1)Question 3: Can somebody else see the shadows or the Astington. J. W., & Jenkins, J. M. (1995). Theory-of-mind develop- ment and social understanding. Cognition and Emotion, 9, 151-bike of Johns dreams? (TOM 1) 165. American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed., Rev.), Washington, DC: Author.Example 9 Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a theory of mind? Cognition, 2J, 37-46. Baron-Cohen, S., Leslie, A. M., & Frith. U. (1986). Mechanical, be-Instruction: I will read you a short story. Listen care- havioral and intentional understanding of picture stories in autisticfully. children. British Journal of Developmental Psychology, 4, 113-Story: It is summer. Will and Mike have their holidays. 125. Bowler, D. M., Strom, E., & Urquhart, L. (1993). Elicitation of first-They go out for a bicycle ride. Suddenly, there is a order "theory of mind" in children with autism. Paper presenteddownpour and they have to shelter in a bus station. at the SRCD Conference, New Orleans, LA.There are two men in the bus station who also shelter Eisenmajer, R., & Prior. M. (1991). Cognitive linguistic correlates of "theory of mind" ability in autistic children. British Journal offrom the rain. One of the men remarks: "Wow, we have Developmental Psychology. 9, 351-364.nice weather today!" Flavell, J. H., Miller, P. H., & Miller, S. (1993). Cognitive develop-Question 1: What does the man mean? (TOM 3) ment. Englewood Cliffs, NJ: Prentice-Hall. Frith. U. (1989). Autism; Explaining the enigma. Oxford: Blackwell.Question 2: Is it true what the man says? (TOM 3) Hadwin, J., Baron-Cohen, S., Howlin, P., & Hill, K. (1996). Can weQuestion 3: Why does the man say: "Wow, we have teach children with autism to understand emotions, belief, or pre-nice weather today!" (TOM 3) tence? Development and Psychopathology, S. 345-365.
80 Muris et al. Lalonde, C. E., & Chandler, M. J. (1995). False belief understanding goes to school: On the social-emotional consequences of coming early or late to a first theory of mind. Cognition and Emotion, 9, 167-185. Leslie. A. M., & Frith, U. (1988). Autistic childrens understanding of seeing, knowing and believing. British Journal of Developmental Psychology, 6, 315-324. Ozonoff, S., & Miller, J. N. (1995). Teaching theory of mind: A new approach to social skills training for individuals with autism. Journal of Autism and Developmental Disorders, 25, 415-433. Perner, J., Frith, U., Leslie, A. M., & Leekam, S. (1989). Exploration of the autistic childs theory of mind: Knowledge, belief and com- munication. Child Development, 60, 689-700. Perner, J., & Wimmer, H. (1985). John thinks that Mary thinks that.., Attribution of second-order beliefs by 5-10 years old children. Journal of Experimental Child Psychology, 39, 437-471. Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioural and Brain Sciences, 4, 515-526. Prior, M., Dahlstrom, B., & Squires, T. (1990). Autistic childrens knowledge of thinking and feeling states in other people. Journal of Child Psychology and Psychiatry, 31, 587-601. Selman, R. L., & Byrne, D. F. (1974). A structural-developmental analysis of levels of role taking in middle childhood. Child De- velopment, 45, 803-806. Slaugther, V., & Gopnik, A. (1996). Conceptual coherence in the childs theory of mind: Training children to understand belief. Child Development, 67, 2967-2988. Spence. S. (1980). Social skills training with children and adolescents. A counselors manual. Windsor: NFER/Nelson. Steerneman, P. (1994). Theory-of-mind screening-schaal fTlieory-of- mind screening-scale]. Leuven/Apeldoorn: Garant. Steerneman, P., Jackson. S., Pelzer, H., & Muris, P. (1996). Children with social handicaps: An intervention program using a theory- of-mind approach. Clinical Child Psychology and Psychiatry, I, 251-263. Swettenham, J. (1996). Can children with autism be taught to under- stand false belief using computers? Journal of Child Psychology Fig. A5. Picture of Example 8. and Psychiatry, 37, 157-165. Vijtigschild, W., Berger, H. J. C., & van Spaendonck, J. A. S. (1969). Sociale Interpretatie Test [Social Interpretation Test]. Amster-Happe. F. (1994). Wechsler IQ profile and theory of mind in autism: dam: Swets & Zeitlinger. A research note. Journal of Child Psychology and Psychiatry, 35, Wechsler, D. (1974). Wechsler Intelligence Scale for Children (Rev.). 1461-1471. New York: Psychological Corp.Happe. F. (1995). The role of age and verbal ability in the theory-of- Wellman, H. (1990). The childs theory of mind. Cambridge. MA: MIT mind task performance of subjects with autism. Child Develop- Press. ment, 66. 567-582. Whiten, A., Irving, K., & Macintyre, K. (1993). Can three-year-oldsHogrefe. G. J., Wimmer. H., & Perner, J. (1986). Ignorance versus and people with autism team to predict the consequences of false false belief: A developmental lag in attribution of epistemic belief. Paper presented at the British Psychological Society De- stales. Child Development. 57. 567-582. velopmental Section Annual Conference, Birmingham, UK.