Intonation - prosody

Intonation - prosody

Introduction to prosody theories and models

© Robert Mannell, Macquarie University, 2007

What is prosody?

Prosody is the study of the tune and rhythm of speech and how these features contribute to meaning.

Prosody is the study of those aspects of speech that typically apply to a level above that of the individual phoneme and very often to sequences of words (in prosodic phrases). Features above the level of the phoneme (or "segment") are referred to as suprasegmentals. A phonetic study of prosody is a study of the suprasegmental features of speech.

At the phonetic level, prosody is characterised by:-

  • vocal pitch (fundamental frequency)
  • loudness (acoustic intensity)
  • rhythm (phoneme and syllable duration)

Phonetic studies of prosody often concentrate on measuring these characteristics.

Prosody has been studied from numerous perspectives by people belonging to differing linguistic schools. There has been great diversity of approaches to prosody. Different approaches examine prosody from the perspective of grammar, of discourse, of pragmatics and of phonetics and phonology

Prosody can be regarded as part of the grammar of a language. Discourse approaches examine the prosody of normal interactions rather than stylised, constructed, fluent, scripted interactions. Functionalist approaches integrate the study of prosody with the study of grammar and meaning in natural social interactions.

Pragmatics examines the distinction between the literal meaning of a sentence and the meaning intended by the speaker. Prosody can have the effect of changing the meaning of a sentence by indicating a speaker's attitude to what is being said (eg. it can indicate irony, sarcasm, etc.) particularly when prosody works in conjunction with the social/situational context of an utterance.

Prosody overlaps with emotion in speech. The same acoustic features that are used to express prosody (intensity, vocal pitch, rhythm, rate of utterance) are also affected by emotion in the voice. For example, I can simultaneously be sad and ironic or fearful and sarcastic.

Speech contains various levels of information that can be described as:-

  • Linguistic - direct expression of meaning
  • Paralinguistic - may indicate attitude or membership of a speech community
  • Non-linguistic - may indicate something about a speaker's vocal physiology, state of health or emotional state

Paralinguistic aspects of speech are those aspects that are not strictly linguistic, but which contribute to the meaning of an utterance. Paralinguistic features may help to indicate a speaker's attitude, although this may overlap with emotional aspects of speech.

Another paralinguistic aspect of speech are those features that indicate a speakers membership of a speech community. These are effectively sociolinguistic markers of speaker identity. eg. Australian versus New Zealand pronunciations, styles of speech of farmers versus bankers, etc.

Some speech communities might prefer broader pronunciations. Some speech communities might prefer more nasal voices. Some speech communities might speak louder or faster.

Gender has both paralinguistic and non-linguistic aspects. Some features may be regarded as more masculine or feminine by a particular speech community (eg. degree of pharyngealisation in Arabic)

But, features that are purely a consequence of physiological differences are non-linguistic aspects of speech

A speaker's emotional state is often evident in the speaker's voice. These features are linguistic to the extent that they are relevant to the meaning of the current utterance. On the other hand, our current emotional state might be a non-linguistic undertone to what is being said (ie. if its not very relevant to what's being said).

Our state of health can be evident in our speech. This would be a non-linguistic aspect of our speech. Note, however, that even this distinction can blur when the health issue is cognitive and affects the expression of meaning.

Segmental and suprasegmental features of speech are both affected by linguistic, paralinguistic and non-linguistic forces.

The main acoustic correlates of prosody (rhythm, intensity and fundamental frequency) are also correlates of paralinguistic and non-linguistic phenomena, particularly emotion.

Schools of prosody

There have been many theoretical approaches to prosody. The earliest such schools dealt with the metrical structure of poetic verse
(eg. the ancient Greeks).

Often the British and American approaches to prosody are contrasted, but this dichotomy is a simplification of the diversity of theoretical and experimental perspectives.

British schools

Crombie (1987) listed the following three British approaches to intonation:-

  • syntactic approach
  • affective or attitudinal approach
  • discoursal approach

Crombie (1987) states that the British schools have the following elements in common:-

  • "dividing the flow of speech into tone groups or tone units ( tonality)"
  • "locating the syllables on which major movements of pitch occur ( tonicity)"
  • "identifying the direction of pitch movements ( tone)"

British schools tend to focus on pitch contours or tunes whilst American schools tend to focus on pitch levels. Different tunes are associated with different meanings.

Central to British models of prosody is the idea of the "tone group".

A tone group is a sequence of speech dominated by prominent or accented word. The accented word is the focal point for the tonal characteristics of the tone group. It contains the strongest, most prominent syllable (usually its primary stressed syllable). The accented syllable, or rather the strongest syllable in the accented word, is often referred to as the nuclear syllable or the tonic syllable. A tone group can contain one or more rhythmic feet. Each foot is dominated by a stressed syllable. In English a foot starts with a stressed syllable and ends with the last unstressed syllable before the next stress.

As an example of a British school we will examine the approach of Michael Halliday and Systemic-Functional linguistics.


"It is not enough to treat intonation systems as if they merely carried a set of emotional nuances ... English intonation contrasts are grammatical" (Halliday, 1967:10)

In contrast, Pike (1945:21), a founder of the American school said that intonation "... is merely a shade of meaning ... superimposed upon ... intrinsic lexical meaning according to the attitude of the speaker".

A consequence of Halliday's view of intonation was that being a part of grammar it should be analysed in the same way as other grammatical systems. Halliday utilises the British concept of tunes which extend across a section of text. These tunes have a "nucleus" which is the "first (salient) syllable in the tonic foot".

Tonality, according to Halliday, is related to the number of tone groups in an utterance and each such tone group is seen as one "move" in a speech act. Tone is "... a complex pattern built out of a simple opposition between certain and uncertain polarity." (Halliday, 1967:30)

Halliday describes 5 simple and 2 compound primary tones for English. They are:-

  • Tone 1 - falling
  • Tone 2 - high rising
  • Tone 3 - low rising
  • Tone 4 - falling-rising
  • Tone 5 - rising-falling
  • Tone 13 - falling plus low rising
  • Tone 53 - rising-falling plus low rising

"If polarity is certain, the pitch of the tonic falls; if uncertain, it rises." (Halliday, 1967:30) Polarity refers to the truth of a statement ("true" or "false" in fact or in belief) or to whether something is "known" versus "unknown". From these tones and the idea of polarity, Halliday builds up a complex pattern of relationships between tone and meaning.

  • Tone 1: falling tone - "polarity known ... the unmarked realisation of a statement" (also a question with known polarity)
  • Tone 2: rising tone - "polarity unknown ... the unmarked realisation of a yes-no question"
  • Tone 3: low rising - "not yet decided whether know or unknown... dependent on something else"
  • Tone 4: falling-rising - "seems certain, but turns out not to be. It is associated with reservations and conditions"
  • Tone 5: rising-falling - "seems uncertain, but turns out to be certain. It is used on strong, especially contradicting assertions ... It often carries an implication of 'you ought to know that"

(the above is from Halliday, 1985, 281-282)

Some examples:-

  • Tone 1 (falling) "That's a dog." - statement
  • Tone 1 (falling) "Is Fido a dog?" - question with known polarity
  • Tone 2 (rising) "Are you coming?" - I don't know if you are coming but want to know. cf. Tone 1 (falling) "Are you coming?" - this is a bit more like a command.
  • Tone 3 (low-rising) "I think I'll come tomorrow." - but not really sure.
  • Tone 4 (falling-rising) "Bill is coming if he's allowed." - conditional statement.
  • Tone 5 (rising-falling) "You ought to know that."

Tone in intonation and lexical tone

The use of the word "tone" in some theories of intonation and prosody needs to be clarified.

This usage must not be confused with lexical tone in tone languages, where changing the pitch contour of a word changes its meaning . For example, changing the tone on "ma" in Mandarin Chinese may change the meaning from "horse" to "mother". That is, changing the tone means that you have selected a different word.

Lexical tone in tone languages is usually attached to a single syllable.

Prosodic tone is attached to a higher level entity such as a tone group (a phrase or sentence characterised by a particular prosodic pattern). Occasionally a tone group might only consist of a single word, which might in turn be a single syllable, but very often it consists of more than one word.

American schools

American schools of prosody are often described as relying on a phonemic or levels approach to intonation. For example, Bloomfield (1933) referred to "differences of pitch ... as secondary phonemes". (but note that Bloomfield, like the British, used pitch contours rather than pitch levels).

Pike (1945) used:-

  • pitch heights to characterise intonation contours (contours are sequences of pitch height)
  • a systematic approach to speaker attitude
  • the interdependence of intonation, stress, quantity, tempo, rhythm and voice quality

(the above summary is after Chun(2002))

Pike (1945) utilised four levels of pitch because "four levels are enough to provide for the writing and distinguishing of all the contours which have differences of meaning so far discovered." "These four levels may, for convenience, be labeled extra-high, high, mid and low respectively..." (Pike, 1945)

The ToBI framework for transcribing prosody (eg. Pierrehumbert and Beckman (1988), Beckman et al. (2005)) is an intonation transcription system based on two relative levels (low and high). ToBI is particularly suited to phonetic analyses of prosody but increasingly it is used in studies of prosody and meaning.

The remainder of this topic will concentrate on the ToBI framework of Pierrehumbert, Beckman and others.

ToBI is dealt with the ToBI introduction page.


These texts were referred to above, but are not required reading.

  1. Beckman, M. E., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). "The original ToBI system and the evolution of the ToBI framework". In S.-A. Jun, ed., Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 9-54. Oxford University Press.
  2. Chun, D.M. (2002) Discourse Intonation in L2: From theory and research to practice, University of California, Santa Barbara
  3. Crombie, W. (1987) "Intonation in English: A systematic perspective".
  4. Halliday, M.A.K. (1967) Intonation and grammar in British English, The Hague: Mouton.
  5. Halliday, M.A.K. (1985) An introduction to Functional Grammar, London: Edward Arnold.
  6. Pierrehumbert, J. B., & Beckman, M. E. (1988). Japanese Tone Structure (Linguistic Inquiry Monograph Series No. 15). MIT Press.
  7. Pike, K.L. (1945) The intonation of American English, Ann Arbor: University of Michigan Press.

Content owner: Department of Linguistics Last updated: 12 Mar 2020 12:17pm

Back to the top of this page