  1. 1. Theory of mind, mirror neurons, and the massive modularity hypothesis "Are there other areas of human competence where one might hope to develop a fruitful theory, analogous to generative grammar? Although this is a very important question, there is very little that can be said about it today. One might, for example, consider the problem of how a person comes to acquire a certain concept of three dimensional space, or an implicit theory of human action, in similar terms. Such a study would begin with the attempt to characterize the implicit theory that underlies actual performance and would then turn to the question of how this theory develops under the given conditions of time and access to data- that is, in what way the resulting system of beliefs is determined by the interplay of available data, heuristic procedures, and the innate schematism that restricts and conditions the form of the acquired system. At the moment, this is nothing more than a sketch of a program of research." Noam Chomsky (1975) Linguistic contributions: Future1 Carl MairIntroductionChomskys sketch of a program of research as outlined in the above quote has sinceblossomed into the modularity hypothesis; a program comprising both a moderate and astrong stance. Moderate modularity, as argued for by Fodor, breaks human cognitivecompetences into input mechanisms and central systems, and contends that the former arehard-wired, encapsulated, inaccessible, domain-specific modules2. The central systems on theother hand, which are equivlant to the higher cognitive faculties, are serviced by a domain-general intelligence. Strong or massive modularity differs from Fodors account bycontending that the central systems themselves are also Fodorean modules. This position hasfound its most forceful proponents in a group of psychologists who base their arguments onan appeal to evolutionary considerations3. The position prides itself in a certain scientifictoughmindedness4, and on its reliance on empirical data. One of the earliest and mostthoroughly researched modular central systems is the cognitive competence ofunderstanding other agents, or theory of mind. This essay will engage a critical discussion ofthe so-called theory of mind module by relying on two main approaches. The first approachwill be to examine the very recent neurological studies of Gallese5 et al into the mirror-neurone systems involved in intentional attunement. Galleses argument that other agentsintentional states are made understandable through autonomous and non-propositionalembodied simulation will be presented as an alternative to the existence of a content-richinnate module, although its relationship to nativism will be shown to be more complex. Thesecond approach will follow Sterelny in seeking to undercut the logical arguments for atheory of mind module, by presenting sophisticated exogenous scaffolding as a substitute forrich conceptual nativism. A third, shorter section will draw on the work of Arbib andTomasello in order to shed some light on how these two approaches can be frutifully1 N. Chomsky Language and Mind (Harcourt Brace, 1975) p 73-742 J Fodor There and Back Again in In Critical Condition (MIT, 1998) p 1283 cosmides and tooby but also Pinker4 Fodor review of Pinker5 V. Gallese Embodied simulation: From neurons to phenomenal experience 1
  2. 2. synthesised. The conclusion will then seek a reconciliation of these two approaches in orderto suggest a novel hypothesis of this crucial human cognitive competence.Embodied simulation: mirror neurones and intentional attunementIt is important at the outset to distinguish Galleses theory of embodied simulation fromSimulation Theory as argued for by Gordon and Goldman in the philosophy of mind.6While the latter involves a willed cognitive effort aimed at interpreting agents intentions,Gallese conceives embodied simulation as an automatic, unconscious, and pre-reflexivefunctional mechanism7. Furthermore, in contrast to simulation theory, Galleses model ofembodied simulation is not a product of a priori reasoning about the nature of the mind, but anhypothesis which arose from detailed neurobiological studies into the brain. Before thishypothesis can be explained, a brief introduction to mirror neuron systems is needed.Gallese was among the original researchers who discovered the systems, and gives thefollowing account8: About ten years ago a new class of pre-motor neurons discharging not only when the monkey executes goal-related hand actions like grasping objects, but also when observing other individuals (monkeys or humans) executing similar actions, was discovered in the macaque monkey brain. These neurons were called "mirror neurons".Since then, numerous brain-scan experiments have found mirror system homologues in thebrains of human agents. The micro-patterning of neural activation in the observer have beendemonstrated to correspond to the activation patterns in the performing agent. Furthermore,the mirroring is not just restricted to grasping, but is also found in facial and speech-relatedmouth actions9, and in the expression of emotions10. The key requirement for mirror neuronactivation seems to be intentional agentive actions11: Mirror neurons constitutively map an agentive relation; the mere observation of an object not acted upon indeed does not evoke any response at all.Moreover, only actions which belong to the motor repertoire of the observer (or are closelyrelated) are mapped on the observers motor systems. Experiments involving humansmirroring responses to monkeys, dogs and other humans have shown that though neuralmirroring occurs in response to human silent-speech (i.e in repertoire actions) and monkeylip-smacking (i.e. closely-related), there is no mirroring in response to dog barking. Putanother way, humans are perceptually tuned to salient actions by conspecifics.Further experiments have also demonstrated that mirror neuron systems are predicitve, in thesense of mediating inferences about the goals of the behaviour of others. In anotherexperiment involving monkeys, the patterns of activation were compared between the fullobservation of a grasping action, and one where the final stage of the action was hiddenbehind an occluder. In the latter case, the majority of the neurons recorded in the first case6 S. Guttenplan A Companion to the philosophy of Mind (Blackwell, 1999) p 5617 V Gallese, see n 5 above, p 418 V Gallese, see n 5 above, p 329 The observation of human silent speech activated the...premotor sector of Brocas region, the same arearesponsible for performance. see V Gallese, see n5 above, p 3510 The experiments to date have been restricted to that of disgust11 V Gallese, see n5 above, p 35 2
  3. 3. still responded, suggesting that by simulating the action, the gap can be filled12, and thatsimulations are in effect models of intentional goal-directed actions.The significance of these findings for gaining some traction on the problem of how humansinterpret each others actions and intentions should be clear. But first to summarise some mainpoints. Human brains (and to a lesser extent, primate brains13) contain populations of mirrorneurons which respond to the observation of agentive actions. Whatever pattern of neuralactivation occurs in the performers brain is mirrored in the corresponding loci of theobservers brain. This holds true for both transitive actions (like grasping, biting) and non-transitive actions (like speech-like mouth actions and the expression of emotion).Furthermore,these embodied simulations of performance can be used by the observer to make inferencesabout the goals of intentional behaviour, as shown by the occluder experiments. Gallese isalert to the philosophical implications of his findings for theory of mind, and indeedexplicitly engages with the literature. In many ways, the model Gallese suggests is analogousto that of embodied cognition14 in the representation debate, since what he stresses is the real-time coupled nature of agents interacting in the world.In short, Galleses conclusions are as follows. The folk-psychological approach tounderstanding other agents is solipsistic in that it assumes one agent understands another bygiving an objective account of her behaviour according to propositional attitudes, like belief,desire etc. But although we can and do give such objective descriptions of agents whenasked to recognize, discriminate, parameterize, or categorize the emotions or sensationsdisplayed by others, we exert our cognitive operations by adopting a third-person perspective,aimed exactly at objectifying the content of our perceptions.15 Real-time online understandingof other agents in interactive encounters is non-conceptual, non-declarative and non-propositional:16 perceive an action is equivalent to internally simulating it. This enables the observer to use her own resources to penetrate the world of the other by means of a direct, automatic, and unconscious process of motor simulation. Such simulation processeses automatically establish a direct link between agent and observer, in that both are mapped in a neutral fashion.Gallese calls this automatically generated inter-agent link intentional attunement, and prefersit to other epistemological approaches because it generates predictions about the intrinsicfunctional nature of our social cognitive operations that cut across, and neither necessarilydepend on, nor are subordinate to any specific cognitive mind ontology, including that of FolkPsychology.17Notwithstanding the explanatory successes of this model, it is clear that intentionalattunement does not give the whole story of our ability to understand other agents. AsGallese concedes, some social stimuli (particularly emotions) can only be understood by theexplicit cognitive elaboration of their contextual aspects and previous information.1812 V Gallese, see n5 above, p 33s13 Scientific american??14 see Andy Clarke15 V Gallese, see n5 above, p 3116 V Gallese, see n5 above, p 3517 V Gallese, see n5 above, p 3118 V Gallese, see n5 above, p 43 3
  4. 4. But these two mechanisms taken together do give the whole story. Embodied simulation andintentional attunement is the experience-based, non-propositional mechanism whichscaffolds19 the propositional, more sophisticated mentalizing mechanism20. Although Galleseuses the term mechanism in the singular, it is unlikely by his own admission, that thesesecond tier sophisticated abilities would be restricted to any one specific region of the brain,and would certainly be larger than a putative domain-specific Theory of Mind Module21.The important question now is to explain this secondary cognitively elaborate mechanism forunderstanding other agents. We now turn to Sterelnys account of folk psychology as anautomated skill whose acquistion is scaffolded by downstream niche construction22 for thesecond half of this story.Downstream epistemic engineeringSuperficially, it may seem that Gallese and Sterelnys accounts of folk psychology areincompatible. Galleses model presents the ability to understand other agents as a brainendowment, universal and innate. Sterelny, as we will see shortly, views folk psychology as aperceptually primed automatic skill, gained by learning. The first account is avowedlynativist; the second anti-nativist. However, Galleses model differs from the usual nativistaccounts of folk psychology in that it does not specify innate representational content. Themechanism is innate; but automatic, pre-reflexive intentional attunement is more likeperception than it is like knowledge. It is of a very different character from a putative theoryof mind module, and it is not one that suffers much from Sterelnys critique.Sterelnys account provides that folk psychology is scaffolded by perceptual tuning23. But itis also further scaffolded by an engineered learning environment which helps the acquistionof the highly sophisticated and cognitively elaborate set of interpretive skills. Here there is anobvious parallel with Galleses two mechanisms. But there is also one obvious clash. WhileSterelnys account views the perceptual mechanisms as merely biased towards picking upsalient information in respect of agents intentions, emotions etc., Galleses model suggeststhat in addition to this bias (i.e a shared repertoire between conspecifics), we also own thoseintentions, emotions etc., in that our mirror systems actively simulate them. It seems thatSterelnys account, like that of folk psychology which Gallese originally attacked, endorses akind of agentive solipsism. This is undeniable; it is only with something like a mirror systemthat the kind of inter-subjectivity Gallese argues for becomes coherent. Though this is thecase, it has no bearing on the force of Sterelnys deeper point. For a start, the model hedevelops is a hybrid between simulation theory and an account of how agents get theinformation to guide the simulations.24 It will be recalled from the previous section thatsimulation theory is really just a solipsistic account of embodied simulation, and one thatrequires conscious cognitive effort as opposed to being unconscious and operationallyautonomous. However, in terms of the explanatory role that it plays in Sterelnys accountthere is little difference between the two theories. The fact that intentional attunement sets upa direct neural link between agents does not mean that the strategic environment19 Gallese suggests that the malfunctioning of this base mechanism may explain the failure of autistics to have afully functioning sophisticated mechanism. Experiments have shown that the mirror systems in autistics areindeed impaired.20 V Gallese, see n5 above, p 4321 V Gallese, see n5 above, p 4322 K. Sterelny Thought in a Hostile World (Oxford, 2002) p 22023 K. Sterelny, see n 22 above, p 22224 K. Sterelny, see n 22 above, p 217 4
  5. 5. miraculously becomes transparent. Agents still deceive, fake emotions, pretend to be otherthan what they are. Even with embodied simulation, the strategic environment is stilltranslucent (though perhaps less so) rather than transparent. More information is required inorder to make accurate interpretive judgments.What follows is Sterelnys hunch on how agents get that extra information, and thus howGalleses model acquires its second sophisticated tier. Furthermore, all of these argumentsundercut the position for the rich conceptual nativism of folk psychology.Before Sterelnys account can be presented, a brief precis of the evolutionary psychologyposition with respect to high order cognitive modules is required.This position depends on three main arguments; two logical, and one just so story. Thelogical arguments are those of ‘poverty of the stimulus’ and the frame problem. The‘poverty of the stimulus’ argument was appropriated from Chomsky’s defence of hisinnateness hypothesis. The argument runs: cognitive competence Z involves a multiplicity ofhighly complex rules and parameters. A general-purpose learning mechanism could onlyacquire Z after extensive tuition, and it would be slow. Since Z is acquired quickly and withminimal tuition, a large amount of this cognitive structure must be endogenous. The secondlogical argument, the ‘frame problem’, can be summarised like this. If the mind were anhomogenous domain-general learning mechanism, every cognitive ‘act’ would beaccompanied by a ‘combinatorial explosion’ as the possible inputs would be enormous, withthe result that cognition could not function. Since cognition does work, the possible inputsaccompanying a cognitive ‘act’ must be limited. Limitation of inputs means domain-specificmodules, not a single domain-general one. The third ‘just so’ story is a speculation on theselection pressures that were extant in a hypothetical ancestral environment. Cosmides andTooby imagine a Pleistocene hunter-gatherer society and the kind of cognitive skills that theymight have needed, which would then have been modularized by the Baldwin effect. Theseinclude: face recognition, friendship, child care, theory of mind, social-exchange, folkbiology, folk physics etc25. Support for a theory of mind module generally follows the roughcontours of the above, though sometimes with the addition of developmental and dissociativearguments from psychology as well.26Though they will not be examined here, Sterelny devotes a fair amount of time to showing theweaknesses in these last two arguments. However, the brunt of his thesis is addressed to theclassic Chomskyan arguments for innate structure, poverty of the stimulus and the frameproblem.Sterelnys argument for the scaffolding of folk psychology on a carefully constructed learningenvironment is essentially one of wealth of the stimulus. While something like embodiedsimulation could form the primary mechanism for intentional understanding, the cognitivelyelaborate and context-sensitive second tier mechanism is learnt in an environment of high-fisignal. The term Sterelny uses for this careful structuring of the learning environment is‘downstream epistemic engineering’: the fact that the way the Nth generation structures itsenvironment affects the way the N+ 1 th generation interprets and perceives it, and so on tothe N+ Nth generation. Put another way, ‘We engineer the informational environment of ourdownstream generation, thus making for more accurate and reliable acquisition of key25 S. Mithen, The Prehistory of the Mind, (T&H, 1996), p 4526 There is evidence that mindreading skills go through a maturation process much like language; and also thatmindreading skills dissociate from other high cognitive skills. 5
  6. 6. capacities’27. Furthermore, the character of these learning environments for interpretive skillsis determined by selection pressures28: Selection for interpretive skills could lead to a different evolutionary trajectory: selection on parents (and via group selection, on the band as a whole) for actions which scaffold the development of interpretive capacities. Selection rebuilds the epistemic environment to scaffold the development of those capacities.Language would also be a crucial part of that learning environment in that it helps in theidentification of perceptually salient inputs29: Labelling turns perceptual tasks into memory tasks…[it] makes aspects of the world transparent by establishing a one-to-one correspondence between sensory properties and functional ones.By reducing the bewildering array of possible distinctions to only that set of salient inputs thelearner of interpretive skills would also avoid the computational frame problem.Before we go on to tie the threads of Sterelnys and Galleses models together, it is useful tosummarise Sterelnys main points. Folk psychology is an acquired automatic skill. Thecognitively sophisticated and context-sensitive mechanisms of mind-reading are learned in awealth of stimulus environment that has been selected for its efficacy in making interpretivecapacities available to novices. Such exogenous scaffolding would make the skill ofmindreading immune to both poverty of the stimulus and frame problem critiques.Furthermore, the learning process is aided by both perceptual tuning and internal simulation.This essay has taken the liberty of substituting Sterelnys posited perceptual mechanisms andsimulation theory by the single cognitve feature of embodied simulation which explains bothfeatures under a single model: actions/gestures/emotions are salient because they are part ofthe observerss repertoire; intentions and goal-directed behaviour is understood by theautomatic, unconscious process of intentional attunementBoth Tomasello and Arbib have posited models which also explain higher human cognitivefaculties as a combination of innate brain capacities and human cultural history. We willbriefly review these before offering a conclusion on the matters discussed.Tomasello and ArbibThe work of Tomasello puts down the singularity of human beings to their ability toaccumulate and modify cultural capital, according to the rachet effect.30 He traces this abilityto our cognitive skill to learn imitatively. Furthermore, Tomasello explains this ability asdependent on a cognitive capacity which seems almost identical to Galleses description ofmiror neurons31: Imitative learning does not just mean mimicking the surface structure of poorly understood also means reproducing an instrumental act understood intentionally, that is reproducing not just the behavioural27 K Sterelny, The Evolution and Evolvability of Culture, p 2528 K. Sterelny, THough in a Hostile world, p 22129 K. Sterelny, Thought…,p 15430 The pattern of accumulation and modification of cultural artifacts through time.31 M Tomasello The Human adaptation for culture 6
  7. 7. means but also the intentional end for which the behavioural means was formulated. This requires some specially adapted skills of social cognition.It seems plausible that mirror neuron systems fit the bill of the specially adapted skills ofsocial cognition which allow agentive actions to be understood intentionally. Risking over-simplication, Tomasello seems to be suggesting the following hierarchy of competences toexplain human singularity: Mirror neuron systems which allow agents to understand actions intentionally X) which scaffolds imitative learning 1.) which scaffolds the groups accumulation of skills and practices; construction of niche 2.) which further scaffolds the accumulation/modification of increasingly complex skills and practices.X is the underlying skill which is the pre-condition for the accumulation of cultural capital.The cultural capital acquired at step 1 by the Nth generation is modified at step 2 by the N+1th generation. The greater the number of repetitions of this process- the greater the numberof turns of the rachet- (i.e the greater the value of N), the richer, more refined and moreadvanced are the types of skills available. More cognitively demanding skills require a largeamount of exogenous scaffolding, and thus are more likely to come about later on unless theyare a pre-condition for this process32. Tomasello suggests that these facts provide a sufficientexplanation for the existence of many of the most distinctive cognitive products that humanbeings produce33. It should also be noted that Tomasellos model of cultural skill acquisitionis probably recapitulated on the level of the maturing skill-acquiring individual34. Sterelnyssignificant contribution to Tomasellos story is his application of this process of culturallearning to understanding other agents, and in particular to deceptive agents.The work of Arbib into the relationship between mirror systems and language has suggestedsimilar conclusions, although his sketch of the chronology of evolved skills links up whereTomasellos starts. Summarised briefly, Arbib sees sophisticated cognitive skills as the endpoint of the following chronology in stages (s):35 s1: Grasping s2: Mirror system for grasping shared with common ancestor of humans and monkey s3 Simple imitation system for object directed grasping. Shared with humans and chimps s4: Complex imitation system for grasping. Hominid line only.The final stage is claimed to involve little if any biological evolution, but instead to resultfrom historical evolution (historical change) and is specific to Homo Sapiens. It is clear thatthis final stage is where Tomasellos and Sterelnys accounts of cultural evolution andepistemic engineering enter the story.The shared thesis between Sterelny, Tomasello and Arbib, in contrast to those who argue fornativism with respect to cognitive capacities, is that many of these capacities are the productof cultural learning scaffolded on general brain endowments.32 perhaps such as language33 M. Tomasello, see n 31 above, p 51334 Although the repertoire of acquirable skills for the individual will obviously be composed of both self-directedtrial and error learning as well as culturally acquired skills35 Arbib The Mirror System Hypothesis. Linking Language to Theory of Mind p2 7
  8. 8. ConclusionIn contrast to the position of the evolutionary psychologists, this essay has argued thatdomain-general learning mechanisms are sufficient for acquiring high-level cognitive skills,and in particular the ability to understand other agents. But this argument only goes through ifthe brain is granted some specific endowments. The empirical studies of Gallese which havediscovered the existence of mirror systems is such an endowment. This mechanism allowsagents to be perceptually tuned to interpretively salient behaviour of other agents; andembodied simulation allows agents direct access to the intentions and goals of other agents.However, this mechanism only makes other agents intentions, emotions and sensationstranslucent. The strategic environment is one where agents lie, fake emotions, and pretend tobe other than what they are. This mechanism thus needs to be supplemented by culturallymediated learning which allows agents to interpret others. Mirror systems provide HomoSapiens with the ability to engage in imitative learning, which allows the accumulation ofcultural capital, including sophisticated interpretive strategies. Sterelnys model of epistemicengineering allows these skills to be exogenously scaffolded such that they can be acquiredwith the requisite high fidelity. Agents may lie, cheat and fake their intentions, but culturallearning gives interpreting agents the skills to deal with these strategies. 8