article 1

Transcript

article 1
ARTICLE IN PRESS
Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
www.elsevier.com/locate/neubiorev
Review
From manual gesture to speech: A gradual transition
Maurizio Gentiluccia, Michael C. Corballisb,
a
Department of Neuroscience, University of Parma, Parma I-43100, Italy
Department of Psychology, University of Auckland, Private Bag 92019, Auckland, New Zealand
b
Received 6 October 2005; received in revised form 15 February 2006; accepted 16 February 2006
Abstract
There are a number of reasons to suppose that language evolved from manual gestures. We review evidence that the transition from
primarily manual to primarily vocal language was a gradual process, and is best understood if it is supposed that speech itself a gestural
system rather than an acoustic system, an idea captured by the motor theory of speech perception and articulatory phonology. Studies of
primate premotor cortex, and, in particular, of the so-called ‘‘mirror system’’ suggest a double hand/mouth command system that may
have evolved initially in the context of ingestion, and later formed a platform for combined manual and vocal communication. In
humans, speech is typically accompanied by manual gesture, speech production itself is influenced by executing or observing hand
movements, and manual actions also play an important role in the development of speech, from the babbling stage onwards. The final
stage at which speech became relatively autonomous may have occurred late in hominid evolution, perhaps with a mutation of the
FOXP2 gene around 100,000 years ago.
r 2006 Elsevier Ltd. All rights reserved.
Keywords: Speech; Gesture; Mirror system; FOXP2 gene; Evolution
Contents
1.
2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The gestural-origins theory . . . . . . . . . . . . . . . . . . . . . . .
2.1. The argument from signed language . . . . . . . . . . .
2.2. Primate origins. . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3. The mirror system . . . . . . . . . . . . . . . . . . . . . . . .
2.4. A gradual switch?. . . . . . . . . . . . . . . . . . . . . . . . .
3. Connections between hand and mouth: empirical evidence
4. Evolutionary speculations. . . . . . . . . . . . . . . . . . . . . . . .
4.1. When did the changes occur? . . . . . . . . . . . . . . . .
4.2. The ‘‘human revolution’’. . . . . . . . . . . . . . . . . . . .
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
949
950
950
951
951
953
954
956
956
957
958
958
958
1. Introduction
Corresponding author. Tel.: +649 373 7599; fax: +649 373 7450.
E-mail address: [email protected] (M.C. Corballis).
0149-7634/$ - see front matter r 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neubiorev.2006.02.004
Language is composed of symbols, which bear little or
no physical relation to the objects, actions, or properties
they represent. This poses problems in the understanding
ARTICLE IN PRESS
950
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
of how language evolved, since it not obvious how abstract
symbols could become associated with aspects of the real
world. One theory proposed by Paget (1930), called
‘‘schematopoeia,’’ holds that spoken words arose initially
from parallels between sound and meaning. For example,
in many languages vowels are opened in words coding
something large, but are closed in words coding something
small (gr/a/nde vs. p/i/ccolo; note too that ‘‘a’’ is
differently pronounced in the words large and small).
Nevertheless most of the things we talk about cannot be
represented iconically through sound, and with very few
exceptions (zanzara, buzz, hum) the actual sounds of most
words convey nothing of what they mean. This raises the
paradox that was well expressed by Rousseau (1775/1964),
who remarked that ‘‘Words would seem to have been
necessary to establish the use of words’’ (pp. 148–149).
In this article we argue that the problem is to some
extent alleviated if it is supposed that language evolved
from manual gestures rather than from vocalizations, since
manual actions can provide more obvious iconic links with
objects and actions in the physical world. Early proponents
of this view were the 18th-century philosophers de
Condillac (1971/1756) and Vico (1953/1744) but it has
been put forward, with variations, many times since (e.g.,
Arbib, 2005; Armstrong, 1999; Armstrong et al., 1995;
Corballis, 1992, 2002; Donald, 1991; Givòn, 1995; Hewes,
1973; Rizzolatti and Arbib, 1998; Ruben, 2005).
The remainder of this article is in three parts. First, we
outline the arguments for the gestural-origins theory.
Second, we present recent data demonstrating close links
between movements of the hand and mouth, adding
support to the theory. Third, we speculate as to the
possible sequence of events in our evolutionary history that
may have led to the replacement of a visuo-manual system
by an auditory–vocal one.
2. The gestural-origins theory
2.1. The argument from signed language
Part of the argument is based on the fact that the signed
languages of the deaf are entirely manual and facial, but
display most, at least, of the essential linguistic properties
of spoken language (Emmorey, 2002; Neidle et al., 2000;
Stokoe, 1960). It is well recognized that signs are
fundamentally different from gestures of the sort that
occur in everyday life, independently of any linguistic
function, and which are iconic or indexical rather than
symbolic. Despite the symbolic nature of signs, though,
there is also an analog, iconic component to signed
languages, suggesting a link to a more iconic form of
communication. In the course of evolution, then, pantomimes of actions might have incorporated gestures that are
analog representations of objects or actions (Donald,
1991), but through time these gestures may have lost the
analog features and become abstract. The shift over time
from iconic gestures to arbitrary symbols is termed
conventionalisation. It appears to be common to both
human and animal communication systems, and is
probably driven by increased economy of reference
(Burling, 1999).
Nevertheless some have argued that the properties of
sign languages are fundamentally different from those of
speech, suggesting that the two may have evolved
independently. For example, it has been claimed that
signed language does not exhibit duality of patterning (e.g.,
Armstrong et al., 1995), which Hockett (1960) proposed as
one of the distinguishing features of language. In speech,
duality refers to the distinction between phonology, in
which elements of meaningless sound are combined to form
meaningful units called morphemes, and syntax, in which
morphemes are combined to form higher-order linguistic
entities. For signed language, Stokoe (1991) proposed a
theory of semantic phonology, in which the components of
signs are themselves meaningful, thus precluding duality in
the strict sense. More recent sign-language models, though,
suggest that the sublexical units of signs are not meaningful, and use the term ‘‘phonology’’ to apply equally to
sign languages as to speech (e.g., Brentari, 1998; Liddell
and Johnson, 1989; Sandler, 1989;Van der Hulst, 1993).
The four basic phonological categories of American Sign
Language (ASL), known as parameters, in ASL are
handshape, location, movement, and orientation of the
hands (Emmorey, 2002), and the same elements have been
identified in Italian Sign Language (LIS, Volterra, 2004/
1987). As evidence that these are independent of meaning,
it has been shown that deaf signers show a ‘‘tip-of-thefingers’’ (TOF) effect comparable to the ‘‘tip-of-thetongue’’ (TOT) effect shown by speakers. The TOT is
induced when speakers cannot retrieve a word they know
they know, but can often correctly produce one or more
phonemes (usually the first). Similarly, TOF refers to a
state in which the signer cannot produce a sign she/he
knows, but correctly produces one or more parameters of
the target. Just as TOT depends on a distinction between
semantics and phonology in speech, so TOF indicates a
similar distinction in signed language, supporting duality of
structure (Thompson et al., 2005).
Another difference lies in the nature of the lexical units.
Although many signs have lost their iconic form, sign
languages retain iconic or analog components that have led
some authors to doubt that spoken language could have
evolved from gestures (e.g., Talmy, in press). In particular,
sign languages have a ‘‘classifier’’ subsystem that is analog
and gradient in character, and that has no parallel in
spoken languages. This system applies primarily to the
representation of spatial attributes like motion and
location (Emmorey, 2002). For example, a signer might
represent the motion of a car passing a tree by making the
sign for a car (thumb raised, index and middle finger
extended forward) with the dominant hand, and a tree
(forearm upright and five fingers extended) with the
nondominant hand, and then moving the dominant hand
horizontally across the torso past the nondominant hand.
ARTICLE IN PRESS
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
The movement could be varied to indicate different kinds
of motion, say upwards or downwards to represent a slope,
quickly or slowly to represent speed, and so forth. These
representations of motion are analog in the sense that they
map directly onto the actual motion that is referred to, and
the representations of the car and the tree have at least a
vestigial iconic aspect. In spoken language, by contrast,
different characteristics of motion are represented categorically, using morphemes or phrases such as pass, climb,
move quickly, etc., and the words car and tree in no way
resemble the objects themselves. Talmy (in press) suggests
that if spoken language had evolved from a manual system,
one would expect a continuation of more analog representation, perhaps with rising pitch to indicate climbing
motion, rapid speech to indicate fast motion, and so on.
The fact that spoken languages are almost entirely
dependent on discrete, recombinant representations suggests, according to Talmy, that language evolved through
the vocal auditory channel, with the visuo-manual system a
secondary form.
Of course, signed and spoken languages are end-states,
and need not represent intermediate stages of transition.
We shall argue that the differences between signed and
spoken languages have to do primarily with the medium
through which language is expressed, rather than with the
nature of language itself, and that these differences do not
preclude a gradual transition from one form to another.
The visuo-manual medium lends itself to efficient representation of spatial concepts in analog fashion, and to a
greater degree of parallel transmission than is possible
using voicing. The auditory–vocal medium, in contrast,
lacks an effective spatial dimension, and forces serial
transmission. Of course some degree of analog representation is possible, and is sometimes used, as in expressions
such as ‘‘up, up, up and away,’’ or ‘‘he’s wa-a-a-ay too
young to understand.’’ On the whole, though, it is probably
much more efficient to use the combinatorial capacities of
the vocal system to create categorical representations.
Although the nature of the differences between signed
and spoken languages remains somewhat controversial, it
now seems reasonably clear that they share the same
underlying structure. Emmorey (2002) summarizes as
follows:
The research strategy of comparing signed and spoken
languages makes it possible to tease apart which
phonological entities arise from the modality of
articulation and perception and which properties arise
from the nature of the expression system of human
language, regardless of modality. The results thus far
suggest that basic phonological entities such as distinctive features, segments, and syllables do not arise
because language is spoken; that is, they do not arise
from the nature of speech. Although the detailed
structure of these entities differs (e.g., distinctive
features in signed language are based on manual, rather
than oral, articulation) they appear to play the same
951
organizational role for both signed and spoken languages (pp. 41–42).
We suggest, then, that there is sufficient commonality
between sign language and speech to give credence to the
idea that language evolved from manual gestures. The
question of how language might have been transformed
from something resembling sign language to vocal speech is
considered in Section 3 below.
2.2. Primate origins
Neurophysiological evidence suggests that nonhuman
primates have little if any cortical control over vocalization, which is critical to speech. This implies that the
common ancestor of humans and chimpanzees was much
better preadapated to develop a voluntary communication
system based on visible gestures rather than sounds. Ploog
(2002) documents two neural systems for vocal behavior, a
cingulate pathway and a neocortical pathway. In nonhuman primates vocalization is largely, if not exclusively,
dependent on the cingulate system. The neocortical system
is progressively developed for voluntary control of manual
movements, including relatively independent finger movements, from monkeys to apes to humans, and is
indispensable for voluntary control (e.g., Hepp-Raymond,
1988). Only in humans is the neocortical system developed
for precise voluntary control of the muscles of the vocal
cords. Monkeys do make extensive use of facial expressions
for communication, but these are more obviously gestural
than language-like (Van Hooff, 1962, 1967). Attempts to
teach vocal language to great apes have achieved much
greater success in communicating in language-like fashion
through manual signs than in acquiring anything resembling vocal language (e.g., Gardner and Gardner, 1969;
Savage-Rumbaugh et al., 1998), which is further evidence
that voluntary control is more highly developed manually
than vocally in our closest primate relatives. The human
equivalents of primate vocalizations are probably emotionally based sounds like laughing, crying, grunting, or
shrieking, rather than words. With the emergence of
bipedalism in the hominid line some 6 million years ago,
the hands were freed from locomotion, providing a
potential boost to the evolution of manual communication.
2.3. The mirror system
Further support for gestural origins comes from the
discovery of neurons in area F5 in the ventral premotor
cortex of the monkey that fire when the animal makes
movements to grasp an object with the hand or mouth
(Rizzolatti et al., 1988). Another set of neurons in the
ventral premotor cortex of the monkey, dubbed ‘‘mirror
neurons,’’ fire also when the animal observes another
individual making the same movements (Ferrari et al.,
2003; Gallese et al., 1996; Rizzolatti et al., 1996). More
recent discoveries, based on both neurophysiological
ARTICLE IN PRESS
952
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
recordings in primates and functional brain imaging in
humans, have identified a more general mirror system,
involving temporal, parietal, as well as frontal regions, that
is specialized for the perception and understanding of
biological motion (Rizzolatti et al., 2001). In monkeys this
system has been demonstrated primarily for reaching–
grasping movements, although it also maps certain movements, such as tearing paper or cracking nuts, onto the
sounds of those movements (Kohler et al., 2002). So far,
there is no clear evidence for a mapping of the production
of vocalizations onto the perception of vocalizations.
However, this mapping is implicit in humans in the socalled motor theory of speech perception (Liberman et al.,
1967), which holds that we understand spoken speech in
terms of how it is produced rather than in terms of its
acoustic properties.
More detailed study of area F5 suggests further
specializations of relevance to the understanding of manual
action. This area is located in the rostral part of the ventral
premotor cortex, and consists of two main sectors, one
located on the dorsal convexity (F5c), the other on the
posterior bank of the inferior arcuate sulcus (F5ab).
Both sectors receive strong input from the second
somatosensory area (SII) and area PF. In addition, F5ab
is the selective target of parietal area AIP (for a review, see
Rizzolatti and Luppino, 2001). Single-neuron recording
studies have shown not only that F5 neurons code specific
actions, such as ‘‘grasping’’, ‘‘holding’’, or ‘‘tearing’’, but
also that many of them code specific types of hand shaping,
such as the precision grip. It is worth noting that hand
shape is an important component of human signed
languages (e.g., Emmorey, 2002). F5 neurons frequently
discharge when the grasping action is performed with the
mouth as well as with the hand (Rizzolatti et al., 1988; see
Fig. 1). These neurons may be functionally involved in
preparing the mouth to grasp the object when the hand
grasps it (Gentilucci et al., 2001), thereby encoding the goal
of the action (taking possession of the object, Rizzolatti
et al., 1988). From an evolutionary point of view, they
may have been instrumental in the transfer of the
gestural communication system from the hand to the
mouth (see below).
Fig. 1. Study of a neuron responding to grasping with the hand and the mouth. The left upper panel shows the approximate location of area F5 on a
lateral view of the monkey brain. (A) Neuron discharge during grasping with the mouth. (B) Neuron discharge during grasping with the hand contralateral
to the recording side. (C) Neuron discharge during grasping with the ipsilateral hand. Rasters and histograms are aligned with the moment when the
animal touches the food.
ARTICLE IN PRESS
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
F5 neurons can fire during specific phases of the grasp,
and some of them, known as canonical neurons, are
activated simply by the presentation of a graspable
object (Murata et al., 1997; Rizzolatti et al., 1988).
Canonical neurons, which are mostly located in the sector
F5ab, are distinct from the mirror neurons described
above, which are found generally in sector F5c. Nevertheless, mirror neurons also frequently respond to grasping
actions, whether executed or observed, and may be
sensitive to particular type of grip used in the action
(Gallese et al., 1996), but they do not respond to the
simple presentation of a graspable object. As of now,
no electrophysiological data (for example using multielectrode recording techniques) are available showing
temporal and functional relationships between canonical
and mirror neurons.
Because the mirror system is activated when observing
and executing the same hand action, it can be considered to
be involved in understanding the action meaning (Gallese
et al., 1996). It might therefore have provided the link
between actor and observer that also exists between sender
and receiver of messages. Rizzolatti and Arbib (1998)
proposed that the mirror system was used as an initial
communication system in language evolution. Indeed, a
comparable mirror system has also been inferred also in
modern humans, based on evidence from electroencephalography (Muthukumaraswamy et al., 2004), magnetoencephalography (Hari et al., 1998), transcranial magnetic
stimulation (Fadiga et al., 1995), and functional magnetic
resonance imaging (Iacoboni et al., 1999). Area F5 is also
considered the homologue of Broca’s area in the human
brain (Rizzolatti and Arbib, 1998), and the mirror system
in general corresponds quite closely with the cortical
circuits, usually on the left side of the human brain, that
are involved in language, whether spoken or signed. The
perception and production of language might therefore be
considered part of the mirror system, and indeed part of
the more general system by which visuo-motor (and audiomotor) integration is used in the understanding of
biological motion.
2.4. A gradual switch?
As anticipated earlier, a critical question for the theory
that language evolved from manual gestures is how the
medium of language shifted from a manuo-visual system to
a vocal-acoustic one. It is likely that this switch was not an
abrupt one, but was rather a gradual change, in which
language evolved initially as a largely manual system, but
facial and vocal elements were gradually introduced, and
evolved to the point that vocalization became the
predominant mode (Corballis, 2002). McNeill (1992) has
pointed out, though, that even today speech-synchronized
manual gestures should be considered part of language, so
the dominance of speech is not complete.
The argument for continuity between manual and vocal
language is supported by evidence that speech itself is
953
fundamentally gestural. This idea is captured by the motor
theory of speech perception (Liberman et al., 1967), and by
what has more recently become known as articulatory
phonology (Browman and Goldstein, 1995). In this view
speech is regarded, not as a system for producing sounds,
but rather as a system for producing articulatory gestures,
through the independent action of the six articulatory
organs—namely, the lips, the velum, the larynx, and the
blade, body, and root of the tongue. This approach is
based largely on the fact that the basic units of speech,
known as phonemes, do not exist as discrete units in the
acoustic signal (Joos, 1948), and are not discretely
discernible in mechanical recordings of sound, as in a
sound spectrograph (Liberman et al., 1967). One reason for
this is that the acoustic signals corresponding to individual
phonemes vary widely, depending on the contexts in which
they are embedded. In particular, the formant transitions
for a particular phoneme can be quite different, depending
on the neighboring phonemes. Yet we can perceive speech
at remarkably high rates, up to at least 10–15 phonemes
per second, which seems at odds with the idea that some
complex, context-dependent transformation is necessary.
Indeed, even relatively simple sound units, such as tones or
noises, cannot be perceived at comparable rates (Warren et
al., 1969), which further suggests that a different principle
underlies the perception of speech. The conceptualization
of speech as gesture overcomes these difficulties, at least to
some extent, since the articulatory gestures that give rise to
speech partially overlap in time (co-articulation), which
makes possible the high rates of production and perception
(Studdert-Kennedy, 2005).
MacNeilage (1998) has drawn attention to the similarity
between human speech and primate sound-producing facial
gestures such as lip smacks, tongue smacks, and teeth
chatters. Ferrari et al. (2003) recorded discharge both from
mirror neurons in monkeys during the lip smack, which is
the most common facial gesture in monkeys, and from
other mirror neurons in the same area during mouth
movements related to eating. This suggests that nonvocal
facial gestures may indeed be transitional between visual
gesture and speech. This is supported by the increasing
recognition that gestures of the face, and more particularly
of the mouth, are components of sign languages, and are
distinct from mouthing, where the signer silently produces
the spoken word simultaneously with the sign that has the
same meaning. Mouth gestures have been studied primarily
in European signed languages, and schemes for the
phonological composition of mouth movements have been
proposed for Swedish (Bergman and Wallin, 2001), English
(Sutton-Spence and Day, 2001) and Italian (Ajello et al.,
2001) Sign Languages. Mouth gestures can serve to
disambiguate hand gestures, and as part of more general
facial gestures provide the equivalent of prosody in speech
(Emmorey, 2002). This work is still in its infancy,
but suggests an evolutionary scenario in which mouth
movements gradually assume dominance over hand movements, and were eventually accompanied by voicing and
ARTICLE IN PRESS
954
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
movements of the tongue and vocal tract. Thus, we suggest,
speech was born.
One interesting class of mouth gestures constitute what is
known as echo phonology, in which movements of the
mouth parallel movements of the hand. For example, the
mouth may open and close in synchrony with the opening
and closing of the hand (Woll, 2002). This may reflect a
fundamental relationship between hand and mouth, which
we explore in the next section.
3. Connections between hand and mouth: empirical evidence
Recent evidence suggests not only that speech is itself
gestural, but that there are intimate connections between
hand and mouth, in monkeys as well as in humans. As we
have seen, the mirror system in the monkey is related to
both arm (Gallese et al., 1996; Rizzolatti et al., 1996) and
mouth actions (Ferrari et al., 2003). This suggests that
gestures of the mouth might have been added to the
manual system to form a combined manuofacial gestural
system. Up to now a mirror system has been documented
only for arm and mouth actions, and the anatomical
closeness of hand and mouth cells in the premotor cortex
may relate to the involvement of both effectors in common
goals. Since food is acquired by using mainly hand and
mouth, it is important for animal maintenance to extract
the action meaning and aim from visual analysis. Area F5
of the monkey premotor cortex includes also a class of
neurons that discharge when the animal grasps an object
with the either the hand or the mouth (Rizzolatti et al.,
1988). Gentilucci et al. (2001) infer a similar class of
neurons in humans. They showed that when subjects were
instructed to open their mouths while grasping objects, the
size of the mouth opening increased with the size of the
grasped object, and conversely, when they open their hands
while grasping objects with their mouths, the size of the
hand opening also increased with the size of the object.
Mirror neurons have also been recorded in the parietal
cortex of the monkey, which is strictly connected with F5;
some neurons discharge when the monkey observes a grasp
action performed with the hand and when the monkey
executes a grasp action with the hand or the mouth
(Gallese et al., 2002). In the evolution of communication,
this mechanism of double command to hand and mouth
could have been instrumental in the transfer of a
communication system, based on the mirror system, from
movements of the hand to movements of the mouth.
Grasping movements of the hand also affect the
kinematics of speech itself. Grasping larger objects
(Gentilucci et al., 2001) and bringing them to the mouth
(Gentilucci et al., 2004a) induces selective increases in
parameters of lip kinematics and voice spectra of syllables
pronounced simultaneously with action execution. Even
observing another individual grasping or bringing to the
mouth larger objects affects the lip kinematics and the
voice spectra of syllables simultaneously pronounced by
the viewer (Gentilucci 2003; Gentilucci et al. 2004a, b).
Again, then, action observation induces the same effects as
action execution. The effects on voicing and lip kinematics
are dependent on the arm movement itself, and not on the
nature of the grasped objects. Indeed, the same effects were
found when either fruits or geometrical solids were
presented, and even when no object was presented (i.e.,
the action was pantomimed, see Fig. 3). By using the
mirror system, an individual observing an arm action can
automatically and covertly execute the same action in order
to interpret the meaning of the action. For manual actions
functionally related to oro-facial actions the motor
command is sent also to the mouth, and reaches the
threshold for execution when the mouth is already
activated to pronounce the syllable.
Gentilucci and colleagues (Gentilucci, 2003; Gentilucci
et al., 2001, 2004b) observed that execution/observation of
the grasp with the hand activates a command to grasp with
the mouth, which modifies the posture of the anterior
mouth articulation, according to the hand shape used to
grasp objects of different size. This, in turn, affects formant
1 (F1) of the voice spectra, which is related to mouth
aperture (Fig. 2). Conversely, execution/observation of the
bringing-to-the-mouth action probably induces an internal
mouth movement (as for example chewing or swallowing),
which affects tongue displacement according to the size of
the object being brought to the mouth (Gentilucci et al.,
2004a, b). This, in turn, modifies speech formant 2 (F2),
which is related to tongue position (Fig. 3). On the basis of
these results we propose that, early in language evolution,
communication signals related to the meaning of actions
(e.g., taking possession of an object by grasping, or
bringing an edible object to the mouth) might have been
associated with the activity of particular articulatory
organs of the mouth that were later co-opted for speech.
The possibility that actions directed to a target might
have been used to communicate is supported by the finding
that the observation of pantomimes influences speech in
the same way that observation of the corresponding real
actions does (Gentilucci et al., 2004a; see Fig. 3). The strict
relationship between representations of actions and spoken
language is supported also by neuroimaging studies, which
show activation of Broca’s area when representing meaningful arm gestures (Buccino et al., 2001; Decety et al.,
1997; Gallagher and Frith, 2004; Grèzes et al., 1998).
Motor imagery of hand movements has also been shown to
activate both Broca’s and left premotor ventral areas
(Gerardin et al., 2000; Grafton et al., 1996; Hanakawa
et al., 2003; Kuhtz-Buschbeck et al., 2003; Parsons et al.,
1995).
The course of events in the evolution of language may be
paralleled by those in the development of language in
children, which also appears to involve the system of
observation/execution of actions directed to a target
(Gentilucci et al., 2004b). This is in accordance with the
notion that a strict relationship exists between early speech
development in children and several aspects of manual
activity, such as communicative and gestures (Volterra
ARTICLE IN PRESS
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
955
LIP KINEMATICS
CHILDREN
ADULTS
50
mm
mm
50
35
35
GRASP OBSERVATION
20
0
250
ms
500
20
0
250
ms
500
PEAK VELOCITY OF LIP APERTURE
mm/s
132
A*
116
C*
100
VOICE SPECTRA
F2
F1
1160
36.8
C
mm
Hz
Hz
C*
**
1600
A*
35.0
A*
A
900
cherry
apple
MAXIMAL LIP APERTURE
1750
C*
1030
cherry
apple
1450
cherry
apple
33.2
cherry
apple
Fig. 2. Effects of the observation of the grasping action on the lip kinematics and the voice spectra of the syllable BA (/ba/) pronounced during action
observation. Children (C) and adults (A) participated in the study. The object grasped by the actor was either a cherry (small object) or an apple (large
object). (A) Hand shaping used by the actor when grasping the cherry and the apple. (B) Participants’ lip kinematics. The upper panels show examples of
the time course of lip opening and closing during syllable pronunciation, i.e., the curves show the time course of the distance between two markers placed
on the upper and lower lip, respectively. Triangles and circles refer to lip movements when observing the grasp of the cherry and of the apple, respectively.
The lower panels show the values of kinematics parameters averaged across subjects. Peak velocity of lip aperture and maximal lip aperture significantly
increased when observing the grasp of the apple as compared to the grasp of the cherry. No significant interaction between age groups (adults vs. children)
and fruit (cherry vs. apple) was found. C: Parameters of the voice spectra of the syllable BA pronounced during grasp observation. The panels show the
mean values of formant 1 (F1) and formant 2 (F2). F1 significantly increased when observing the grasp of the apple in the comparison with the grasp of the
cherry. This effect was greater in children than in adults. *Significant main effect of fruit. **Significant interaction between age group and fruit. Bars are
standard errors (SE).
et al., 2005; Bates and Dick, 2002). For example, canonical
babbling in children aged from 6 to 8 months is
accompanied by rhythmic hand movements (Masataka,
2001). Manual gestures predate early development of
speech in children, and predict later success even up to
the two-word level (Iverson and Goldin-Meadow, 2005).
Word comprehension in children between 8 and 10 months
and word productions between 11 and 13 months are
accompanied by deictic and recognition gestures, respectively (Bates and Snyder, 1987; Volterra et al., 1979).
From a behavioral point of view, words and manual
gestures are communicative signals, which according to
McNeill (1992) are synchronized with one another,
suggesting that speech and gesture form a single, integrated
system. Further support for the integration of speech and
gesture comes from Bernardis and Gentilucci (2006), who
showed that voice spectra parameters of words pronounced
simultaneously with execution of the corresponding-inmeaning gesture increased in comparison with those
resulting from word pronunciation alone. This was not
observed when the gesture was meaningless. Conversely,
pronouncing words slowed down the simultaneous execution of the gesture, which did not occur when pseudowords were pronounced. These effects of voice enhance-
ment and arm inhibition were interpreted as due to a
process of transferring some aspects (such as the intention
to interact closely) from the gesture to the word (Bernardis
and Gentilucci, 2006). On the other hand, the verbal
response to a message expressed by the combination of
word and gesture is different from that to either communication signal alone. In fact, the voice spectra of words
pronounced in response to simultaneously listening to and
observing the speaker making the corresponding-in-meaning gesture are enhanced, just as they are by the
simultaneous production of both word and gesture
(Bernardis and Gentilucci, 2006). Broca’s area is probably
involved in the simultaneous control of gestures and word
pronunciation. Indeed, the effects of gesture observation
on word pronunciation, described above, were extinguished during temporary inactivation of this area using
repetitive Transcranial Magnetic Stimulation (Gentilucci
et al., in press).
In summary, the connections between hand and mouth
reviewed above may have been established initially in the
context of ingestive movements of the mouth, and the acts
of grasping and bringing food to the mouth, but adapted
later for communication. MacNeilage (1998) has suggested
that speech itself originated from repetitive ingestive
ARTICLE IN PRESS
956
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
Fig. 3. Effects of the execution and observation of the bringing-to-the-mouth action on the voice spectra of the syllable BA (/ba/) pronounced during
action execution/observation. The object brought to the mouth was either a cherry (small object) or an apple (large object). Upper panels show the
execution effects. During the task the participant (shown in the panel) executed the action and simultaneously either pronounced the syllable BA or
emitted a vocalization unrelated to Italian (/œ/). Lower panels show the observation effects. During the task, the participant observed the actor (shown in
the panels) executing the action and simultaneously pronounced the syllable BA. (A) Execution and pronunciation of BA. (B) Execution and vocalization
unrelated to Italian. (C) Observation of the action and pronunciation of BA. (D) Observation of a pantomime of the action and pronunciation of BA.
Note that during pantomime neither object was presented, nor did the mouth open. (E) Observation of a pantomime of the action executed with a nonbiological arm (i.e., a shape of the arm) and pronunciation of BA. F1: Formant 1. F2: Formant 2. Gray (cherry) and black (apple) bars refer to the object
brought to the mouth, respectively. Execution of bringing the apple to the mouth induced an increase in F2 as compared to the action executed with the
cherry (A). The action did not affect a vocalization unrelated to Italian (B). Observation of the action (C) and of a pantomime of the action (D) induced
the same effects as execution of these actions. No effect was found when the pantomime was executed with a non-biological arm (E).
movements of the mouth. This may well be correct, but we
suggest that it is only half the story, since it neglects the
important role, in primates at least, of hand and arm
movements in eating.
4. Evolutionary speculations
4.1. When did the changes occur?
Although the connections between hand and mouth were
probably well established in our primate forebears, fully
articulate vocalization may not have been possible until
fairly late in hominid evolution, and perhaps not until the
emergence of our own species, Homo sapiens. As we have
seen, there is little if any cortical control over vocalization
in nonhuman primates (Ploog, 2002), and it has proven
virtually impossible to teach chimpanzees anything approaching human speech (Hayes, 1952). Moreover, fossil
evidence suggests that the alterations to the vocal tract
(e.g., D. Lieberman, 1998; Lieberman et al., 1972) and to
the mechanisms of breath control (MacLarnon and Hewitt,
1999, 2004) necessary for articulate speech were not
completed until late in hominid evolution, and perhaps
only with the emergence of our own species, H. sapiens,
which is dated at some 170,000 years ago (Ingman et al.,
2000).
A further clue comes from study of an extended family in
England, known as the KE family. Half of the members of
this family are affected by a disorder of speech and
language, which is evident from the affected child’s first
attempts to speak and persists into adulthood (VarghaKhadem et al., 1995). The disorder is now known to be due
to a point mutation on the FOXP2 gene (forkhead box P2)
on chromosome 7 (Fisher et al., 1998; Lai et al, 2001). For
normal speech to be acquired, two functional copies of this
gene seem to be necessary. The nature of the deficit, and
therefore the role of the FOXP2 gene, have been debated.
Some have argued that FOXP2 gene is involved in the
development of morphosyntax (Gopnik, 1990), and it has
even been identified more broadly as the ‘‘grammar gene’’
(Pinker, 1994)—although Pinker (2003) has since recognized that other genes probably also played a role in the
ARTICLE IN PRESS
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
evolution of grammar. Subsequent investigation suggests,
however, that the core deficit in affected members of the
KE family is one of articulation, with grammatical
impairment a secondary outcome (Watkins et al., 2002a).
It may therefore play a role in the incorporation of vocal
articulation into the mirror system, but have little to do
with grammar itself (Corballis, 2004a).
This is supported by a study in which fMRI was used to
record brain activity in both affected and unaffected
members of the KE family while they covertly generated
verbs in response to nouns (Liégeois et al., 2003). Whereas
unaffected members showed the expected activity concentrated in Broca’s area in the left hemisphere, affected
members showed relative underactivation in both Broca’s
area and its right-hemisphere homologue, as well as in
other cortical language areas. They also showed overactivation bilaterally in regions not associated with
language. However, there was bilateral activation in the
posterior superior temporal gyrus; the left side of this area
overlaps Wernicke’s area, important in the comprehension
of language. This suggests that affected members may have
generated words in terms of their sounds, rather than in
terms of articulatory patterns. Their deficits were not
attributable to any difficulty with verb generation itself,
since affected and unaffected members did not differ in
their ability to generate verbs overtly, and the patterns of
brain activity were similar to those recorded during covert
verb generation. Another study based on structural MRI
showed morphological abnormalities in the same areas
(Watkins et al., 2002b).
The FOXP2 gene is highly conserved in mammals, and in
humans differs in only three places from that in the mouse,
but two of the three changes occurred on the human
lineage after the split from the common ancestor with the
chimpanzee and bonobo. A recent estimate of the date of
the more recent of these mutations suggests that it occurred
‘‘since the onset of human population growth, some
10,000–100,000 years ago’’ (Enard et al., 2002, p. 871).
If this is so, fully articulate vocal language may not have
emerged until after the appearance of our species,
H. sapiens, some 170,000 years ago in Africa.
This is not to say that the FOXP2 gene was the only gene
involved in the switch to an autonomously vocal system;
rather, it was probably just the final step in a series of
progressive changes. Selective changes to the vocal tract,
breathing, and cortical control of vocal language suggest
that there must have been selective pressure to replace a
system that was largely based on manual and facial
gestures to one that could rely almost exclusively on
vocalization, albeit with manual accompaniments. Why,
then, would such pressure have existed? One factor may
have been greater energy requirements associated with
gesture; we have anecdotal evidence from those attending
courses in sign language that the instructors required
regular massages in order to meet the sheer physical
demands of sign language expression. The physiological
costs of speech, in contrast, are so low as to be nearly
957
unmeasurable (Russell et al., 1998). The switch would also
have allowed communication at night, or when speakers
and listeners are out of visual contact, and would have
freed the hands for other activities, including the use and
manufacture of tools. Vocal language allows people to
speak and use tools at the same time, leading perhaps to
pedagogy (Corballis, 2002). This may well have been one of
the factors underlying the so-called ‘‘human revolution,’’
which we discuss next.
4.2. The ‘‘human revolution’’
The ‘‘human revolution’’ (Mellars and Stringer, 1989)
refers to the dramatic appearance of more sophisticated
tools, bodily ornamentation, art, and perhaps music,
dating from some 40,000 years ago in Europe, and
probably earlier in Africa (McBrearty and Brooks, 2000;
Oppenheimer, 2003). Despite some imprecision in the
estimates of the dates both the FOXP2 mutation and the
human revolution, the two dates are fairly close, and
suggest that the mutation of the FOXP2 gene may have
been the final step in the evolution of autonomous speech.
This raises the possibility that the final incorporation of
vocalization into the mirror system was critical to the
emergence of modern human behavior in the Upper
Paleolithic (Corballis, 2004b).
The human revolution is more commonly attributed to
the emergence of symbolic language itself than to the
emergence of speech (e.g., Klein et al., 2004; Mellars, 2004).
This implies that language must have evolved very late, and
quite suddenly, in hominid evolution. Some have associated it with the arrival of our own species, H. sapiens,
about 170,000 years ago. Bickerton (1995), for example,
writes that ‘‘ytrue language, via the emergence of syntax,
was a catastrophic event, occurring within the first few
generations of Homo sapiens sapiens.’’ Crow (2002) has
similarly proposed that the emergence of language was part
of the speciation event that gave rise to H. sapiens. The
association of the evolutionary explosion with the human
revolution suggests that language may have emerged even
later, as proposed by Klein et al. (2004), although there is
still debate over the extent and time frame of the human
revolution (e.g., McBrearty and Brooks, 2000).
Given the complexity of syntax, still not fully understood
by linguists, it seems unlikely that these ‘‘big bang’’ theories
of language evolution can be correct. It seems much more
likely that language evolved incrementally, perhaps beginning with the emergence of the genus Homo from around 2
million years ago. Pinker and Bloom (1990) argue, contrary
to earlier views expressed by Chomsky (1975), that
language evolved incrementally through natural selection,
and Jackendoff (2002) has proposed a series of stages
through which this might have occurred. In something of a
change of stance for Chomsky, Hauser et al. (2002) have
also highlighted a continuity between primate and human
communication, again suggesting the gradual evolution
of human language—although they do not consider the
ARTICLE IN PRESS
958
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
possibility that language evolved from manual and
facial gestures, nor do they speculate as to precisely when
the uniquely human component (what they call ‘‘faculty
of language in the narrow sense’’) emerged in
hominid evolution. If syntactic language evolved
gradually over the past 2 million years, then it seems
reasonable to suppose that it was already well developed by
the time H. sapiens appeared a mere 170,000 or so
years ago. As we have seen, it now seems likely that the
FOXP2 gene has to do with oral–motor control rather than
with syntax.
One may question whether the switch to a fully
autonomous vocal language could have brought about an
effect as apparently profound as the human revolution. As
noted above, speech would have freed the hands, enhancing pedagogy, which itself may be a uniquely human
characteristic (e.g., Csibra and Gergely, in press). More
generally, changes in the medium of communication have
had deep influences on our material culture. Without the
advent of writing, and the later development of mathematical notation, for example, we would surely not have had
our modern contrivances such as the automobile, or the
supersonic jet. The Internet may well prove to have
comparable effects. We suggest, then, that the switch from
a manuo-facial to a vocal means of communication would
have especially enhanced material culture, including the
manufacture and use of tools. Indeed, it is primarily in
material culture that the human revolution is manifest,
whereas the earlier evolution of language itself may have
been expressed in, and perhaps driven by, complex social
interaction, or what has been called cultural cognition
(Tomasello et al., 2005). The social component may be less
visible in the archeological record. The human revolution
may therefore give a false impression of the evolution of
the human mind itself.
5. Conclusion
In conclusion, a system based on iconic and progressively symbolic gestures evolved from an initial gesture
communication system based on pantomimes of actions.
Grammar might have evolved as the sequence of hand and
arm gestures increased in complexity. In line with
Corballis’s (2002) proposal, at the various stages of
evolution arm postures were integrated with mouth
articulation postures by the double hand–mouth command
system. Autonomy of speech from the arm-gesture communication system, at least to the point that language can
be understood through speech alone, was probably reached
when the alterations of the vocal tract and vocal control
necessary for articulate speech were completed. Only at this
point could the signal be carried autonomously by the
vocal system. This stage may not have been reached until
the emergence of our own species, H. sapiens, and may
have been facilitated by the mutation of the FOXP2 gene
within the past 100,000 years.
Acknowledgments
This work was supported by grant MIUR (Ministero
dell’Istruzione Universitaria e della Ricerca) to M.G. We
thank Karen Emmorey, Michael Studdert-Kennedy, and
Len Talmy for helpful discussion, although they do not
necessarily agree with our conclusions.
References
Ajello, R., Mazzoni, L., Nicolai, F., 2001. Linguistic gestures: mouthing in
Italian sign languages (LIS). In: Sutton-Spence, R., Boyes-Braem, P.
(Eds.), The Hands are the Head of the Mouth: The Mouth as
Articulator in Sign Language. Signum-Verlag, Hamburg, Germany,
pp. 231–246.
Arbib, M.A., 2005. From monkey-like action recognition to human
language: an evolutionary framework for neurolinguistics. Behavioral
and Brain Sciences 28, 105–168.
Armstrong, D.F., 1999. Original Signs: Gesture, Sign, and the Source of
Language. Gallaudet University Press, Washington, DC.
Armstrong, D.F., Stokoe, W.C., Wilcox, S.E., 1995. Gesture and the
Nature of Language. Cambridge University Press, Cambridge, MA.
Bates, E., Dick, F., 2002. Language, gesture, and the developing brain.
Developmental Psychobiology 40, 293–310.
Bates, E., Snyder, L.S., 1987. The cognitive hypothesis in language
development. In: Ina, E., Uzgiris, C., McVicker Hunt, E.J. (Eds.),
Infant Performance and Experience: New Findings with the Ordinal
Scales. University of Illinois Press, Urbana, IL, USA, pp. 168–204.
Bergman, B., Wallin, L., 2001. A preliminary analysis of visual mouth
segments in Swedish Sign Language. In: Sutton-Spence, R., BoyesBraem, P. (Eds.), The Hands are the Head of the Mouth: The Mouth
as Articulator in Sign Language. Signum-Verlag, Hamburg, Germany,
pp. 51–68.
Bernardis, P., Gentilucci, M., 2006. Speech and gesture share the same
communication system. Neuropsychologia 44, 178–190.
Bickerton, D., 1995. Language and Human Behavior. University of
Washington Press, Seattle, WA.
Brentari, D., 1998. A Prosodic Model of Sign Language Phonology. MIT
Press, Cambridge, MA.
Browman, C.P., Goldstein, L.F., 1995. Dynamics and articulatory
phonology. In: van Gelder, T., Port, R.F. (Eds.), Mind as Motion.
MIT Press, Cambridge, MA, pp. 175–193.
Buccino, G., Binkofski, F., Fink, G.R., Fadiga, L., Fogassi, L., Gallese,
V., et al., 2001. Action observation activates premotor and parietal
areas in a somatotopic manner: an fMRI study. European Journal of
Neuroscience 13, 400–404.
Burling, R., 1999. Motivation, conventionalization, and arbitrariness in
the origin of language. In: King, B.J. (Ed.), The Origins of Language:
What Nonhuman Primates can Tell Us. School of American Research
Press, Santa Fe, NM, pp. 307–350.
Chomsky, N., 1975. Reflections on Language. Pantheon, New York.
de Condillac, E.B., 1971. An Essay on the Origin of Human Knowledge:
Being a Supplement to Mr. Locke’s Essay on the Human Understanding (A facsimile reproduction of the 1756 translation by T.
Nugent of Condillac’s 1747 essay). Scholars’ Facsimiles and Reprints,
Gainesville, FL.
Corballis, M.C., 1992. On the evolution of language and generativity.
Cognition 44, 197–226.
Corballis, M.C., 2002. From Hand to Mouth: The Origins of Language.
Princeton University Press, Princeton, NJ.
Corballis, M.C., 2004a. FOXP2 and the mirror system. Trends in
Cognitive Sciences 8, 95–96.
Corballis, M.C., 2004b. The origins of modernity: was autonomous speech
the critical factor? Psychological Review 111, 543–552.
Crow, T.J., 2002. Sexual selection, timing, and an X–Y homologous gene:
did Homo sapiens speciate on the Y chromosome? In: Crow, T.J. (Ed.),
ARTICLE IN PRESS
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
The Speciation of Modern Homo Sapiens. Oxford University Press,
Oxford, UK, pp. 197–216.
Csibra, G., Gergely, G., in press. Social learning and social cognition: The
case for pedagogy. In: Johnson, M.H., Munakata, Y. (Eds.), Processes
of Change in Brain and Cognitive Development. Attention and
Performance XXI. Oxford University Press, Oxford, UK.
Decety, J., Grezes, J., Costes, N., Perani, D., Jeannerod, M., Procyk, E.,
et al., 1997. Brain activity during observation of actions. Influence of
action content and subject’s strategy. Brain 120, 1763–1777.
Donald, M., 1991. Origins of the Modern Mind. Harvard University
Press, Cambridge, MA.
Emmorey, K., 2002. Language, Cognition, and Brain: Insights from Sign
Language Research. Erlbaum, Hillsdale, NJ.
Enard, W., Przeworski, M., Fisher, S.E., Lai, C.S.L., Wiebe, V., Kitano,
T., et al., 2002. Molecular evolution of FOXP2, a gene involved in
speech and language. Nature 418, 869–871.
Fadiga, L., Fogassi, L., Pavesi, G., Rizzolatti, G., 1995. Motor facilitation
during action observation—a magnetic stimulation study. Journal of
Neurophysiology 73, 2608–2611.
Ferrari, P.F., Gallese, V., Rizzolatti, G., Fogassi, L., 2003. Mirror neurons
responding to the observation of ingestive and communicative mouth
actions in the monkey ventral premotor cortex. European Journal of
Neuroscience 17, 1703–1714.
Fisher, S.E., Vargha-Khadem, F., Watkins, K.E., Monaco, A.P.,
Pembrey, M.E., 1998. Localisation of a gene implicated in a severe
speech and language disorder. Nature Genetics 18, 168–170.
Gallagher, H.L., Frith, C.D., 2004. Dissociable neural pathways for the
perception and recognition of expressive and instrumental gestures.
Neuropsychologia 42, 1725–1736.
Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G., 1996. Action
recognition in the premotor cortex. Brain 119, 593–609.
Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G., 2002. Action
representation and the inferior parietal lobule. In: Prinz, W., Hommel,
B. (Eds.), Common Mechanisms in Perception and Action, Attention
and Performance XIX, III Action perception and imitation. Oxford
University Press, Oxford, UK, pp. 334–355.
Gardner, R.A., Gardner, B.T., 1969. Teaching sign language to a
chimpanzee. Science 165, 664–672.
Gentilucci, M., 2003. Grasp observation influences speech production.
European Journal of Neuroscience 17, 179–184.
Gentilucci, M., Benuzzi, F., Gangitano, M., Grimaldi, S., 2001. Grasp
with hand and mouth: a kinematic study on healthy subjects. Journal
of Neurophysiology 86, 1685–1699.
Gentilucci, M., Santunione, P., Roy, A.C., Stefanini, S., 2004a. Execution
and observation of bringing a fruit to the mouth affect syllable
pronunciation. European Journal of Neuroscience 19, 190–202.
Gentilucci, M., Stefanini, S., Roy, A.C., Santunione, P., 2004b. Action
observation and speech production: study on children and adults.
Neuropsychologia 42, 1554–1567.
Gentilucci, M., Bernardis, P., Crisi, G., Dalla Volta, R., in press.
Repetitive transcranial stimulation of Broca’s area affects verbal
responses to gesture observation. Journal of Cognitive Neuroscience.
Gerardin, E., Sirigu, A., Lehericy, S., Poline, J.B., Gaymard, B., Marsault,
C., et al., 2000. Partially overlapping neural networks for real and
imagined hand movements. Cerebral Cortex 10, 1093–1104.
Givòn, T., 1995. Functionalism and Grammar. Benjamins, Philadelphia,
PA.
Gopnik, M., 1990. Feature-blind grammar and dysphasia. Nature 344,
715.
Grafton, S.T., Arbib, M.A., Fadiga, L., Rizzolatti, G., 1996. Localization
of grasp representations in humans by positron emission tomography.
2. Observation compared with imagination. Experimental Brain
Research 112, 103–111.
Grèzes, J., Costes, N., Decety, J., 1998. Top-down effect of strategy on the
perception of human biological motion: a PET investigation. Cognitive
Neuropsychology 15, 553–582.
Hanakawa, T., Immisch, I., Toma, K., Dimyan, M.A., Van Gelderen, P.,
Hallett, M., 2003. Functional properties of brain areas associated with
959
motor execution and imagery. Journal of Neurophysiology 89,
989–1002.
Hari, R., Forss, N., Avikainen, S., Kirveskari, E., Salenius, S., Rizzolatti,
G., 1998. Activation of human primary motor cortex during action
observation: a neuromagnetic study. Proceedings of the National and
Academy of Sciences of the USA 95, 15061–15065.
Hauser, M.D., Fitch, W.T., Chomsky, N., 2002. The faculty of language:
what is it, who has it, and how did it evolve? Science 298, 1569–1579.
Hayes, C., 1952. The Ape in Our House. Gollancz, London.
Hepp-Raymond, M.-C., 1988. Functional organization of motor cortex
and its participation in voluntary movements. In: Steklis, H.D., Erwin,
J. (Eds.), Comparative Primate Biology, Vol. 4. Neurosciences. Alan
R. Liss, New York, pp. 501–624.
Hewes, G.W., 1973. Primate communication and the gestural origins of
language. Current Anthropology 14, 5–24.
Hockett, C.F., 1960. The origin of speech. Scientific American 203 (3),
88–96.
Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C.,
Rizzolatti, G., 1999. Cortical mechanisms of human imitation. Science
286, 2526–2528.
Ingman, M., Kaessmann, H., Pääbo, S., Gyllensten, U., 2000. Mitochondrial genome variation and the origin of modern humans. Nature 408,
708–713.
Iverson, J.M., Goldin-Meadow, S., 2005. Gesture paves the way for
language development. Psychological Science 16, 367–371.
Jackendoff, R., 2002. Foundations of Language: Brain, Meaning,
Grammar, Evolution. Oxford University Press, Oxford, UK.
Joos, M., 1948. Acoustic Phonetics. Language Monograph No. 23.
Linguistic Society of America, Baltimore, MD.
Klein, R.G., Avery, G., Cruz-Uribe, K., Halkett, D., Parkington, J.E.,
Steele, T., et al., 2004. The Ysterfontein 1 Middle Stone Age site, South
Africa, and early human exploitation of coastal resources. Proceedings
of the National Academy of Sciences of the USA 101, 5708–5715.
Kohler, E., Keysers, C., Umilta, M.A., Fogassi, L., Gallese, V., Rizzolatti,
G., 2002. Hearing sounds, understanding actions: Action representation in mirror neurons. Science 297, 846–848.
Kuhtz-Buschbeck, J.P., Mahnkopf, C., Holzknecht, C., Siebner, H.,
Ulmer, S., Jansen, O., 2003. Effector-independent representations of
simple and complex imagined finger movements: a combined fMRI
and TMS study. European Journal of Neuroscience 18, 3375–3387.
Lai, C.S., Fisher, S.E., Hurst, J.A., Vargha-Khadem, F., Monaco, A.P.,
2001. A novel forkhead-domain gene is mutated in a severe speech and
language disorder. Nature 413, 519–523.
Liberman, A.M., Cooper, F.S., Shankweiler, D.S., Studdert-Kennedy, M.,
1967. Perception of the speech code. Psychological Review 74,
431–461.
Liddell, S., Johnson, R., 1989. American Sign Language: The phonological base. Sign Language Studies 64, 197–277.
Lieberman, D., 1998. Sphenoid shortening and the evolution of modern
cranial shape. Nature 393, 158–162.
Lieberman, P., Crelin, E.S., Klatt, D.H., 1972. Phonetic ability and related
anatomy of the new-born, adult human, Neanderthal man, and the
chimpanzee. American Anthropologist 74, 287–307.
Liégeois, F., Baldeweg, T., Connelly, A., Gadian, D.G., Mishkin, M.,
Vargha-Khadem, F., 2003. Language fMRI abnormalities associated
with FOXP2 gene mutation. Nature Neuroscience 6, 1230–1237.
MacLarnon, A., Hewitt, G., 1999. The evolution of human speech: The
role of enhanced breathing control. American Journal of Physical
Anthropology 109, 341–363.
MacLarnon, A., Hewitt, G., 2004. Increased breathing control: another
factor in the evolution of human language. Evolutionary Anthropology 13, 181–197.
MacNeilage, P.F., 1998. The frame/content theory of evolution of speech.
Behavioral and Brain Sciences 21, 499–546.
Masataka, N., 2001. Why early linguistic milestones are delayed in
children with Williams syndrome: late onset of hand banging as a
possible rate-limiting constraint on the emergence of canonical
babbling. Developmental Science 4, 158–164.
ARTICLE IN PRESS
960
M. Gentilucci, M.C. Corballis / Neuroscience and Biobehavioral Reviews 30 (2006) 949–960
McBrearty, S., Brooks, A.S., 2000. The revolution that wasn’t: a new
interpretation of the origin of modern human behavior. Journal of
Human Evolution 39, 453–563.
McNeill, D., 1992. Hand and Mind: What Gestures Reveal about
Thought. University of Chicago Press, Chicago.
Mellars, P.A., 2004. Neanderthals and the modern human colonization of
Europe. Nature 432, 461–465.
Mellars, P.A., Stringer, C.B. (Eds.), 1989. The Human Revolution:
Behavioural and Biological Perspectives on the Origins of Modern
Humans. Edinburgh University Press, Edinburgh.
Murata, A., Fadiga, L., Fogassi, L., Gallese, V., Raos, V., Rizzolatti, G.,
1997. Object representation in the ventral premotor cortex (area F5) of
the monkey. Journal of Neurophysiology 78, 2226–2230.
Muthukumaraswamy, S.D., Johnson, B.W., McNair, N.A., 2004. Mu
rhythm modulation during observation of an object-directed grasp.
Cognitive Brain Research 19, 195–201.
Neidle, C., Kegl, J., MacLaughlin, D., Bahan, B., Lee, R.G., 2000. The
Syntax of American Sign Language. MIT Press, Cambridge, MA.
Oppenheimer, S., 2003. Out of Eden: The Peopling of the World.
Constable, London.
Paget, R., 1930. Human Speech: Some Observations, Experiments and
Conclusions as to the Nature, Origin, Purpose and Possible Improvement of Human Speech. P. Kegan, T. Trench, Trubner & Co, New
York, NY.
Parsons, L.M., Fox, P.T., Downs, J.H., Glass, T., Hirsch, T.B., Martin,
C.C., et al., 1995. Use of implicit motor imagery for visual shape
discrimination as revealed by PET. Nature 375, 54–58.
Pinker, S., 1994. The Language Instinct. Morrow, New York.
Pinker, S., 2003. Language as an adaptation to the cognitive niche. In:
Christiansen, M.H., Kirby, S. (Eds.), Language Evolution. Oxford
University Press, Oxford, pp. 16–37.
Pinker, S., Bloom, P., 1990. Natural language and natural selection.
Behavioral and Brain Sciences 13, 707–784.
Ploog, D., 2002. Is the neural basis of vocalisation different in non-human
primates and Homo sapiens? In: Crow, T.J. (Ed.), The Speciation of
Modern Homo Sapiens. Oxford University Press, Oxford, pp. 121–135.
Rizzolatti, G., Arbib, M.A., 1998. Language within our grasp. Trends in
Neurosciences 21, 188–194.
Rizzolatti, G., Luppino, G., 2001. The cortical motor system. Neuron 31,
889–901.
Rizzolatti, G., Camarda, R., Fogassi, L., Gentilucci, M., Luppino, G.,
Matelli, M., 1988. Functional organization of inferior area 6 in the
macaque monkey. II. Area F5 and the control of distal movements.
Experimental Brain Research 71, 491–507.
Rizzolatti, G., Fadiga, L., Gallese, V., Fogassi, L., 1996. Premotor cortex
and the recognition of motor actions. Cognitive Brain Research 3,
131–141.
Rizzolatti, G., Fogassi, L., Gallese, V., 2001. Neurophysiological
mechanisms underlying the understanding and imitation of action.
Nature Reviews Neuroscience 2, 661–670.
Rousseau, J.J., 17751964. Discours sur l’origine et les fondements de
l’inégalité parmi les hommes. In: Gagnebin, B., Raymond, M. (Eds.),
Oeuvres Complètes, vol. 3. Gallimard, Paris.
Ruben, R.J., 2005. Sign language: its history and contribution to the
understanding of the biological nature of language. Acta OtoLaryngologica 125, 464–467.
Russell, B.A., Cerny, F.J., Stathopoulos, E.T., 1998. Effects of varied
vocal intensity on ventilation and energy expenditure in women and
men. Journal of Speech, Language and Hearing Research 41, 239–248.
Sandler, W., 1989. Phonological Representation of the Sign: Linearity and
Nonlinearity in American Sign Language. Foris Publications, Dortrecht, The Netherlands.
Savage-Rumbaugh, S., Shanker, S.G., Taylor, T.J., 1998. Apes, Language,
and the Human Mind. Oxford University Press, New York.
Stokoe, W.C., 1960. Sign Language Structure: An Outline of the
Communicative Systems of the American Deaf. Linstock Press, Silver
Spring, MD.
Stokoe, W.C., 1991. Semantic phonology. Sign Language Studies 71,
107–114.
Studdert-Kennedy, M., 2005. How did language go discrete? In: Tallerman, M. (Ed.), Language Origins: Perspectives on Evolution. Oxford
University Press, Oxford, pp. 48–67.
Sutton-Spence, R., Day, L., 2001. Mouthings and mouth gestures in
British Sign Language. In: Boyes-Braem, P., Sutton-Spence, R. (Eds.),
The Hands are the Head of the Mouth: The Mouth as Articulator in
Sign Languages. Signum-Verlag, Hamburg, pp. 69–86.
Talmy, L., in press. Recombinance in the evolution of language. In:
Cihlar, J.E., Kaiser, D., Kimbara, I., Franklin, A. (Eds.), Proceedings
of the 39th Annual Meeting of the Chicago Linguistic Society. Chicago
Linguistic Society, Chicago, IL.
Thompson, R., Emmorey, K., Gollan, T.H., 2005. ‘‘Tip of the fingers’’
experiences by deaf signers. Psychological Science 16, 856–860.
Tomasello, M., Carpenter, M., Call, J., Behen, T., Moll, H., 2005.
Understanding and sharing intentions: the origin of cultural cognition.
Behavioral and Brain Sciences 28, 635–673.
Van der Hulst, H., 1993. Units in the analysis of signs. Phonology 10,
209–241.
Van Hooff, J.A.R.A.M., 1962. Facial expressions in higher primates.
Symposium of the Zoological Society of London 8, 97–125.
Van Hooff, J.A.R.A.M., 1967. The facial displays of the catarrhine
monkeys and apes. In: Morris, D. (Ed.), Primate Ethology. Weidenfield and Nicolson, London, pp. 7–68.
Vargha-Khadem, F., Watkins, K.E., Alcock, K.J., Fletcher, P., Passingham, R., 1995. Praxic and nonverbal cognitive deficits in a large
family with a genetically transmitted speech and language disorder.
Proceedings of the National Academy of Sciences of the USA 92,
930–933.
Vico, G.B., 19531744. La Scienza Nuova. Laterza, Bari.
Volterra, V., 20041987. La lingua italiana dei segni. Il Mulino, Bologna.
Volterra, V., Bates, E., Benigni, L., Bretherton, I., Camaioni, L., 1979.
First words in language and action: a qualitative look. In: Bates, E.,
Benigni, L., Bretherton, I., Camaioni, L., Volterra, V. (Eds.), The
Emergence of Symbols: Cognition and Communication in Infancy.
Academic Press, New York, pp. 141–222.
Volterra, V., Caselli, M.C., Capirci, O., Pizzuto, E., 2005. Gesture and
the emergence and development of language. In: Tomasello,
M., Slobin, D.I. (Eds.), Beyond Nature–Nurture: Essays in Honor
of Elizabeth Bates. Lawrence Erlbaum Associates, Mahwah, NJ,
pp. 3–40.
Warren, R.M., Obusek, C.J., Farmer, R.M., Warren, R.P., 1969.
Auditory sequence: confusion of patterns other than speech or music.
Science 164, 586–587.
Watkins, K.E., Dronkers, N.F., Vargha-Khadem, F., 2002a. Behavioural
analysis of an inherited speech and language disorder: comparison
with acquired aphasia. Brain 125, 452–464.
Watkins, K.E., Vargha-Khadem, F., Ashburner, J., Passingham, R.E.,
Connelly, A., Friston, K.J., et al., 2002b. MRI analysis of an inherited
speech and language disorder: structural brain abnormalities. Brain
125, 465–478.
Woll, B., 2002. The sign that dares to speak its name: echo phonology
in British Sign Language (BSL). In: Boyes-Braem, P., SuttonSpence, R. (Eds.), The Hands are the Head of the Mouth: The Mouth
as Articulator in Sign Languages. Signum-Verlag, Hamburg,
pp. 87–98.