Functional imaging during speech production

https://doi.org/10.1016/S0001-6918(01)00026-9Get rights and content

Abstract

Physiological studies of speech production have demonstrated that even simple articulation involves a range of specialized motor and cognitive processes and the neural mechanisms responsible for speech reflect this complexity. Recently, a number of functional imaging techniques have contributed to our knowledge of the neuroanatomical and neurophysiological correlates of speech production. These new imaging approaches have the advantage of permitting study of large numbers of normal and disordered subjects but they bring with them a host of new methodological concerns. One of the challenges for understanding language production is the recording of articulation itself. The problems associated with measuring the vocal tract and measuring the neural activity during overt speech are reviewed. It is argued that advances in understanding fundamental questions such as what are the planning units of speech, what is the role of feedback during speech and what is the influence of learning, await the development of better methods for assessing task performance.

Introduction

The production of spoken words is the end product of a complex network of linguistic and cognitive processes. Thoughts and intentions are remarkably transformed into a sequence of movements and sounds in the order of hundreds of milliseconds. One of the great achievements of psycholinguistic research in the past 30 years has been the preliminary system identification of this language output system. Distinct phases of planning and control in this process have been identified through the study of errors (e.g., Fromkin, 1973) and studies of the timing of production (Levelt, 1989) and formal models of language production have been proposed to account for these data.

Levelt's model is representative of these frameworks and it is summarized in Fig. 1. As can be seen in this figure, the progression from concept to articulation is thought to involve separate processes that impart syntactic, morphological, phonological and phonetic organization. At the end of this flow of information processing lies the final and often forgotten stage of the language sequence, articulation itself. Most psycholinguistic models of this kind give minimal attention to the final stage of actual speech production. Like a last minute addition to a guest list, articulation does not feature in early phases of planning. In fact, in many models of language production, articulation is portrayed as a passive and modular final filter for language production. For a variety of technical reasons, this de-emphasis of articulation has been mirrored in functional imaging studies of language production. Many neural imaging studies have used a range of “silent” tasks in which words are not even spoken.1 This is unfortunate because it de-emphasizes the medium in which the evolution of language presumably took place and it makes an unwarranted “pure insertion” assumption about speech motor control (Jennings, McIntosh, Kapur, Tulving, & Houle, 1997).

There are many reasons for giving the articulation component of language production more attention in functional imaging studies. Foremost amongst these is the fact that the speech motor system is not a passive channel for the transmission of linguistic signals from prior planning stages. Rather, it transforms those signals in a number of ways. At the most peripheral level, the nonlinear mechanics of tissue and muscle, the inertial forces of the moving articulators and the complexities of force generation in muscles each contribute to the form of articulation and thus to the final form of the speech output. In order to accommodate this degree of nonlinearity, the nervous system is thought to have internal models of the vocal tract and the acoustic consequences of articulation (Guenther, 1994; Hirayama, Vatikiotis-Bateson, & Kawato, 1994; Jones & Munhall, 2000; Jordan, 1990, Jordan, 1996; Kawato, Furukawa, & Suzuki, 1987). An internal model is a representation of the forces and kinematics of movements as well as the feedback that results from those movements (see Miall & Wolpert, 1996; Kawato, 1999, for discussions of internal models in movement control). In this depiction of the act of speaking, articulation involves significant information processing in order to manage the motor system in the vocal tract during phonetic sequencing. Timing, force, and trajectory control of a large number of articulators have to be programmed in rapid succession. This computation engages a neurally distributed planning and control system.

The amount of information that must be conveyed and therefore programmed by the motor system is remarkable. Talkers modify their speech style and precision in real time to fit social and environmental context and to respond to conversational demands. Mood, intent, attention, as well as the conceptual and emotional meaning of a message are all transmitted in parallel with phonetic information by subtle differences in the way in which words are spoken. These subtle differences are produced by systematic modifications to the movement timing and trajectories of the oral articulators. For example, talkers can voluntarily modify the clarity with which they speak (Gagne, Masterson, Munhall, Bilida, & Querengesser, 1995) and speech is perceptually distinct if it is read, recited, voluntarily produced, attentionally engaged, etc. (e.g., Remez, Lipton, & Fellowes, 1996). In addition, there is variability in articulation from a number of other sources. Speaking rate, lexical and emphatic stress and syllabification can vary from repetition to repetition.

How this motor planning is accomplished is not well understood but an interaction between levels of planning is indicated. For example, Saltzman, Lofqvist, and Mitra (2000) have used a phase resetting paradigm to study the relationship between central “clock” mechanisms and the timing of peripheral motor events in speech. When randomly timed mechanical perturbations are applied to the lower lip during repetitive syllable production, the patterns of timing adjustments are consistent with a model in which central timing is modulated by the peripheral motor system (Gracco & Abbs, 1989). Kawato (1997) has suggested that this arrangement is preferred for computational reasons as well. During trajectory planning, the solution space can be reduced more effectively if constraints (e.g., smoothness) are specified at other planning levels (Kawato, 1997). To do this however, the trajectory planning (along with other information processing problems in motor control such as coordinate transformations and motor command generation) must be computed simultaneously rather than sequentially.

Understanding articulation, then, is not simply a matter of understanding how the jaw, for example, moves for a canonical phoneme target. Rather, such an understanding must include a description of coordination in the context of the range of performance conditions occurring during communication. Trajectory formation for the articulators would thus involve interaction with many levels of the language system as well as involving complex motor planning in order to achieve these goals in the presence of the biomechanical and physiological complexity of the vocal tract (Munhall, Kawato, & Vatikiotis-Bateson, 2000).

The nonadditivity of cognitive processing and type of response has recently been demonstrated in a semantic judgment task (Jennings et al., 1997). PET activation patterns2 varied for the same semantic task depending on the response mode (mouse click, overt speech, silent thought). Jennings et al. suggest that the way in which subjects organized their responses induced activation in different areas of the brain for the semantic processing. This kind of interaction is consistent with a system with many reciprocal connections and feedback projections such as the language system.

Factors such as these argue strongly for careful study of speech production in functional imaging studies. From a practical point of view, the manner in which words are spoken defines part of the “task” in any language production study and the presence of real articulation permits more precise task monitoring in imaging studies. Response timing, accuracy and task compliance can only be measured when there actually is a response! There is good evidence that this should be a concern in language production studies. Untrained talkers are not very precise at controlling many speech variables in laboratory studies of language output. For example, speaking rate, in spite of its popularity as a manipulation in speech production studies, is notoriously variable (Miller & Baer, 1983). Rate of articulation has also been reported to influence regional blood flow in PET experiments (e.g., Paus, Perry, Zatorre, Worsley, & Evans, 1996).

Indefrey and Levelt (2000) have recently argued that a range of poorly understood “lead-in” processes used in neuroimaging studies of word production (e.g., picture naming, generating words from beginning letter, word reading, repetition) have to be more carefully considered since they influence neuroimaging results. In a similar vein, details of articulation define a range of “lead-out” processes that will influence imaging data. Variability in speech obviously can result from different neural control strategies at many levels of the production system and this will be reflected in different neural activation patterns.

In addition to these methodological reasons for studying the functional representation of overt speech there are more theoretical reasons as well. Speech can be viewed as a complex linguistic process (Liberman & Whalen, 2000), and thus the neural representation of speech gestures is seen as an inextricable part of the language system. In Liberman's view, the phonetic gestures are the “common currency” of two-way communication and thus are the primitives of human language.

More pragmatically, there is a set of classic problems in the understanding of articulation that can profitably be addressed using functional imaging. These include (a) specifying the fundamental units of speech coordination, understanding the implementation of these units in a range of contexts that influence precision and timing, (b) understanding how central representation of units interact with feedback in real time and (c) understanding how these units are acquired and modified by learning. For each of these problems imaging could aid in both the neural mapping of the behavior and potentially the specification of the processing components of the behavior (Kosslyn, 1999). In the final section of the paper I will return to these problems.

To date, functional imaging has played two distinct roles in the study of speaking. First, a form of functional structural imaging has long been important for the precise description of the actual behavior of talking. Secondly, functional neural imaging has recently contributed to our understanding of the cognitive and linguistic processes responsible for articulation and for mapping these putative processes onto the neural architecture. In this paper I will give an overview of these two roles played by imaging, particularly magnetic resonance imaging (MRI), and comment on how functional imaging can help resolve enduring problems in the field. I will begin by commenting on the state of functional structural and functional neural imaging using MR.

Section snippets

Structural imaging of speaking

Sound production in human speech involves the generation of sound (e.g., by the vocal folds) and its filtering by the acoustic properties of the vocal tract (i.e., shape, size, wall characteristics, etc.). This division of explanation into source and filter has been the working model in speech research for the last 50 years (Fant, 1950) but speech bioacoustics and speech motor control are still far from completely understood. In part, this is due to the relative inaccessibility of the speech

Functional neural imaging of speaking

It has been known for over a century that multiple regions of the brain are involved in producing spoken language. Our knowledge of these neuroanatomical correlates of speech production primarily comes from deficit/lesion studies (see Caplan, 1992) and electrical stimulation studies (e.g., Ojemann, 1991; Penfield & Roberts, 1959) but recently, a number of imaging techniques have also been used. (See Savoy, 2001, for an extensive historical review.)

The key components of the language production

Three problems in speech motor control

As indicated above, recording the neural correlates of articulation is a methodologically difficult task. However, the full problem involves more challenges than just the complexity of the data acquisition. As Indefrey and Levelt (2000) suggest, some kind of task decomposition is required before functional imaging of language production can succeed. We need to specify the necessary and sufficient components of speaking aloud. In doing so, we not only must identify the key neural subsystems of

Conclusions

After more than a decade of functional neural imaging of language production, we have seen significant advances in our understanding of language production as well as the methodologies for studying it. A variety of new imaging and neural recording techniques are available to researchers today (e.g., fMRI, PET, MEG, ERP, etc.), each with its own strengths and weaknesses. The study of articulation, the end product of language production, is particularly challenging and may require the use of

Acknowledgments

This work was funded by NIH grant DC-00594 from the National Institute of Deafness and other Communications Disorders and NSERC. The author wishes to thank Jeff Jones for helpful comments on an earlier draft of this manuscript. Brad Story (Fig. 2) and Mark Tiede (Fig. 3) generously shared their vocal tract images.

References (120)

  • P. Muller-Preuss et al.

    Inhibition of auditory cortical neurous during phonation

    Brain Research

    (1981)
  • S. Nadeau et al.

    Subcortical aphasias

    Brain and Language

    (1997)
  • J. Numminen et al.

    Differential effects of overt, covert and replayed speech on vowel-evoked responses of the human auditory cortex

    Neuroscience Letters

    (1999)
  • J. Numminen et al.

    Subject's own speech reduces reactivity of the human auditory cortex

    Neuroscience Letters

    (1999)
  • H. Op de Beeck et al.

    Can neuroimaging really tell us what the brain is doing? The relevance of indirect measures of population activity

    Acta Psychologica

    (2001)
  • L. Parsons

    Integrating cognitive psychology, neurology, and neuroimaging

    Acta Psychologica

    (2001)
  • J. Perkell et al.

    Speech motor control: acoustic goals,saturation effects, auditory feedback and internal models

    Speech Communication

    (1997)
  • D.A. Poeppel

    Critical review of PET studies of phonological processing

    Brain and Language

    (1996)
  • R.A. Poldrack

    Imaging brain plasticity: conceptual and methodological issues – a theoretical review

    NeuroImage

    (2000)
  • C.J. Price et al.

    The effect of varying stimulus rate and duration on brain activity during reading

    NeuroImage

    (1996)
  • J.H. Abbs et al.

    Control of complex motor gestures: orofacial muscle responses to load perturbations of the lip during speech

    Journal of Neurophysiology

    (1984)
  • T. Baer et al.

    Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels

    Journal of the Acoustical Society of America

    (1991)
  • S. Baum et al.

    The neural bases of prosody: insights from lesion studies and neuroimaging

    Aphasiology

    (1999)
  • Bavelas, J., & Chovil, N., (1997). Faces in dialogue. In J. A. Russell, & J. M. Fernández-Dols (Eds.), The psychology...
  • R. Birn et al.

    Magnetic field changes in the human brain due to swallowing or speaking

    Magnetic Resonance in Medicine

    (1998)
  • R. Birn et al.

    Event-related fMRI of tasks involving brief motion

    Human Brain Mapping

    (1999)
  • T.A. Burnett et al.

    Voice F0 responses to manipulations in pitch feedback

    Journal of the Acoustical Society of America

    (1998)
  • D.E. Callan et al.

    An auditory-feedback based neural network model of speech production that is robust to developmental changes in the size and shape of articulatory systems

    Journal of Speech, Language & Hearing Research

    (2000)
  • D. Caplan

    Language

    (1992)
  • D.R. Corfield et al.

    Cortical and subcortical control of tongue movement in humans: a functional neuroimaging study using fMRI

    Journal of Applied Physiology

    (1999)
  • Cowie, R., & Douglas-Cowie, E. (1992). Trends in linguistics, studies and monographs. Postlingually acquired deafness...
  • O. Creutzfeldt et al.

    Neuronal activity in the human lateral temporal lobe. II. Responses to the subjects own voice

    Experimental Brain Research

    (1989)
  • Damasio, A. R. & Damasio, H. (1988). Advances in the neuroanatomical correlates of aphasia and the understanding of the...
  • A.R. Damasio et al.

    Aphasia with lesions in the basal ganglia and internal capsule

    Archives in Neurology

    (1982)
  • J. Dang et al.

    Morphological and acoustical analysis of the nasal and the paranasal cavities

    Journal of the Acoustical Society of America

    (1994)
  • Demolin, D., Metens, T., & Soquet, A. (2000). Real time MRI and articulatory coordinations in vowels. In Proceedings of...
  • N.F. Dronkers

    A new brain region for co-ordinating speech articulation

    Nature

    (1996)
  • Dronkers, N. F., Redfern, B. B., & Knight, R. T. (2000). The neural architecture of language disorders. In M. S....
  • G. Fant

    Acoustic theory of speech production

    (1950)
  • J.A. Fiez

    Phonology, semantics, and the role of the left inferior prefrontal cortex

    Human Brain Mapping

    (1997)
  • J.W. Folkins et al.

    Lip and jaw motor control during speech: responses to resistive loading of the jaw

    Journal of Speech & Hearing Research

    (1975)
  • P.T. Fox et al.

    Brain correlates of stuttering and syllable production

    Brain

    (2000)
  • V. Fromkin

    Speech errors as linguistic evidence

    (1973)
  • J.P. Gagne et al.

    Across talker variability in speech intelligibility for conversational and clear speech

    Journal of the Academy of Rehabilitative Audiology

    (1995)
  • Grabowski, T., & Damasio, A. (2000). Investigating language with functional imaging. In A. W. Toga, & J. C. Mazziotta...
  • V.L. Gracco et al.

    Dynamic control of the perioral system during speech: kinematic analyses of autogenic and nonautogenic sensorimotor processes

    Journal of Neurophysiology

    (1985)
  • V.L. Gracco et al.

    Central patterning of speech movements

    Experimental Brain Research

    (1988)
  • V.L. Gracco et al.

    Sensorimotor characteristics of speech motor sequences

    Experimental Brain Research

    (1989)
  • F. Guenther

    A neural network model of speech acquisition and motor equivalent production

    Biological Cybernetics

    (1994)
  • S. Hirano et al.

    Cortical speech processing mechanisms while vocalizing visually presented languages

    NeuroReport

    (1996)
  • Cited by (37)

    • A verbal strength in children with Tourette syndrome? Evidence from a non-word repetition task

      2016, Brain and Language
      Citation Excerpt :

      Importantly, this also holds for non-word repetition and related tasks (Kalm & Norris, 2014; Liegeois, Morgan, Connelly, & Vargha-Khadem, 2011; Papoutsi et al., 2009; Peeva et al., 2010; Strand, Forssberg, Klingberg, & Norrelgen, 2008). Moreover, children with TS appear to have faster speech rates than TD children (De Nil et al., 2005), and speech production also depends on these structures (Munhall, 2001). Overall, these data lend support to the hypothesis that the frontal/basal ganglia and dopaminergic abnormalities in TS may underlie not only the tics and hyperkinetic profile, but also the rapid processing reported for a variety of tasks in the disorder, including non-word repetition.

    • Using active shape modeling based on MRI to study morphologic and pitch-related functional changes affecting vocal structures and the airway

      2014, Journal of Voice
      Citation Excerpt :

      This is important because uncovering factors responsible for such variability could lead to new insights and, therefore, testable hypotheses concerning fundamental questions in voice science6: for example, what mechanisms underlie the “singing formant”7 and the rise and fall of the larynx with changes of voice pitch?8,9 Traditionally, the focus of MRI in voice science is restricted to the investigation of changing dimensions of the vocal tract and vocal structures and changing relationships between articulators such as the lips, jaws, tongue, and soft palate (velum).1,10 In previous studies, it was suggested that this focus was too narrow because such an approach neglects to account for the fact that all vocal structures have direct and/or indirect attachments to the skeletal frame, that is, the skull, cervical spine, sternum, and scapula.11,12

    • Multichannel fNIRS assessment of overt and covert confrontation naming

      2012, Brain and Language
      Citation Excerpt :

      However, it should be noted that such estimates have been made mostly in reference to studies on covert word production where subjects speak subvocally. Overt articulation (where subjects speak normally) involves the activation of additional brain regions not activated during silent word processing (Birn, Cox, & Bandettini, 2004), and has been known to activate various motor and cognitive processes (Munhall, 2001; Quaresima, Ferrari, van der Sluijs, Menssen, & Colier, 2002). Also, differences between covert and overt word production have been indicated across imaging studies (Indefrey & Levelt, 2004).

    View all citing articles on Scopus
    View full text