Phoneme and allophone

Phoneme and allophone

Robert Mannell, Macquarie University, 2008


The course material for this topic is divided into the sub-topics  below.

Additional reading

Students should also read the following: Clark, Yallop & Fletcher (2007) (Chapter 4)


Trubetzkoy (1939) wrote

"It is the task of phonology to study which differences in sound are related to differences in meaning in a given language, in which way the discriminative elements ... are related to each other, and the rules according to which they may be combined into words and sentences."

Linguistic units which cannot be substituted for each other without a change in meaning can be referred to as linguistically contrastive or significant units. Such units may be phonological, morphological, syntactic, semantic etc.

Logically, this takes the form:-

  IF unit X in context A GIVES meaning 1
  AND IF unit Y in context A GIVES meaning 2
  THEN unit X AND unit Y belong to separate linguistic units
eg. IF sound [k] in context [_æt] GIVES meaning "cat"
  AND IF sound [m] in context [_æt] GIVES meaning "mat"
  THEN sound [k] and sound [m] belong to separate linguistic units


Phonemes are the linguistically contrastive or significant sounds (or sets of sounds) of a language. Such a contrast is usually demonstrated by the existence of minimal pairs or contrast in identical environment (C.I.E.). Minimal pairs are pairs of words which vary only by the identity of the segment (another word for a single speech sound) at a single location in the word (eg. [mæt] and [kæt]). If two segments contrast in identical environment then they must belong to different phonemes. A paradigm of minimal phonological contrasts is a set of words differing only by one speech sound. In most languages it is rare to find a paradigm that contrasts a complete class of phonemes (eg. all vowels, all consonants, all stops etc.).

eg. the English stop consonants could be defined by the following set of minimally contrasting words:-

i) /pɪn/ vs /bɪn/ vs /tɪn/ vs /dɪn/ vs /kɪn/

Only /ɡ/ does not occur in this paradigm and at least one minimal pair must be found with each of the other 5 stops to prove conclusively that it is not a variant form of one of them.

ii) /ɡɐn/ vs /pɐn/ vs /bɐn/ vs /tɐn/ vs /dɐn/

Again, only five stops belong to this paradigm. A single minimal pair contrasting /ɡ/ and /k/ is required now to fully demonstrate the set of English stop consonants.

iii) /ɡæɪn/ vs /kæɪn/

Sometimes it is not possible to find a minimal pair which would support the contrastiveness of two phonemes and it is necessary to resort to examples of contrast in analogous environment (C.A.E.). C.A.E. is almost a minimal pair, however the pair of words differs by more than just the pair of sounds in question. Preferably, the other points of variation in the pair of words are as remote as possible (and certainly never adjacent and preferably not in the same syllable) from the environment of the pairs of sounds being tested. eg. /ʃ/ vs /ʒ/ in English are usually supported by examples of pairs such as "pressure" [preʃə] vs "treasure" [treʒə], where only the initial consonants differ and are sufficiently remote from the opposition being examined to be considered unlikely to have any conditioning effect on the selection of phones. The only true minimal pairs for these two sounds in English involve at least one word (often a proper noun) that has been borrowed from another language (eg. "Confucian" [kənfjʉːʃən] vs "confusion" [kənfjʉːʒən], and "Aleutian" [əlʉːʃən] vs "allusion" [əlʉːʒən]).

A syntagmatic analysis of a speech sound, on the other hand, identifies a unit's identity within a language. In other words, it indicates all of the locations or contexts within the words of a particular language where the sound can be found.

For example, a syntagm of the phone [n] in English could be in the form:-
( #CnV..., #nV..., ...Vn#, ...VnC#, ...VnV..., etc.)

whilst [ŋ] in English would be:-
(...Vŋ#, ...VŋC#, ...VŋV..., etc)

but would not include the word initial forms of the kind described for [n].

Note that in the above examples, "#" is used to represent a word or syllable boundary, "V" represents any vowel, and "C" represents another consonant.

For example, examples of the type "#CnV..." would include "snow" [snəʉ], "snort" [snoːt] and "snooker" [snʉːkə]. In this case, the only consonant (for English) that can occupy the initial "C" slot is the phoneme /s/, and so the generalised pattern could be rewritten as "#snV...".


Allophones are the linguistically non-significant variants of each phoneme. In other words a phoneme may be realised by more than one speech sound and the selection of each variant is usually conditioned by the phonetic environment of the phoneme. Occasionally allophone selection is not conditioned but may vary form person to person and occasion to occasion (ie. free variation).

A phoneme is a set of allophones or individual non-contrastive speech segments. Allophones are sounds, whilst a phoneme is a set of such sounds.

Allophones are usually relatively similar sounds which are in mutually exclusive or complementary distribution (C.D.). The C.D. of two phones means that the two phones can never be found in the same environment (ie. the same environment in the senses of position in the word and the identity of adjacent phonemes). If two sounds are phonetically similar and they are in C.D. then they can be assumed to be allophones of the same phoneme.

eg. in many languages voiced and voiceless stops with the same place of articulation do not contrast linguistically but are rather two phonetic realisations of a single phoneme (ie. /p/=[p,b],/t/=[t,d], and /k/=[k,ɡ]). In other words, voicing is not contrastive (at least for stops) and the selection of the appropriate allophone is in some contexts fully conditioned by phonetic context (eg. word medially and depending upon the voicing of adjacent consonants), and is in some contexts either partially conditioned or even completely unconditioned (eg. word initially, where in some dialects of a language the voiceless allophone is preferred, in others the voiced allophone is preferred, and in others the choice of allophone is a matter of individual choice).

eg. Some French speakers choose to use the alveolar trill [r] when in the village and the more prestigious uvular trill [ʀ] when in Paris. Such a choice is made for sociological reasons.

Phonetic similarity

Allophones must be phonetically similar to each other. In analysis, this means you can assume that highly dissimilar sounds are separate phonemes (even if they are in complementary distribution). For this reason no attempt is made to find minimal pairs which contrast vowels with consonants. Exactly what can be considered phonetically similar may vary somewhat from language family to language family and so the notion of phonetic similarity can seem to be quite unclear at times. Sounds can be phonetically similar from both articulatory and auditory points of view and for this reason one often finds a pair of sounds that vary greatly in their place of articulation but are sufficiently similar auditorily to be considered phonetically similar (eg. [h] and [ç] are voiceless fricatives which are distant in terms of glottal and palatal places of articulation, but which nevertheless are sufficiently similar auditorily to be allophones of a single phoneme in some languages such as Japanese).

eg. In English, /h/ and /ŋ/ are in complementary distribution. /h/ only ever occurs at the beginning of a syllable (head, heart, enhance, perhaps) whilst /ŋ/ only ever occurs at the end of a syllable (sing, singer, finger). They are, however, so dissimilar that no one regards them as allophones of the one phoneme. They vary in place and manner of articulation, as well as voicing. Further the places of articulation (velar vs glottal) are quite remote from each other and /h/ is oral whilst /ŋ/ is nasal.

According to Hockett (1942), "...if a and b are members of one phoneme, they share one or more features". Phonetic similarity is therefore based on the notion of shared features. Such judgments of similarity will vary from language to language and there are no universal criteria of similarity.

The following pairs of sounds might be considered to be similar.

i) two sounds differing only in voicing:
[pb] [td] [kɡ] [ɸβ] [θð] [sz] [ʃʒ] [xɣ] etc...

ii) two sounds differing in manner of articulation only as plosive vs fricative. The sibilant or grooved fricatives [s,z,ʃ,ʒ] are excluded from this category as they are quite different auditorily from the other ("central") fricatives.
[pɸ] [kx] [bβ] [ɡɣ] etc...

iii) Any pairs of consonants close in place of articulation and differing in no other contrastive feature:
[sʃ] [zʒ] [nɲŋ] [lɭ] [lʎ] [mɱ], etc...

iv) Any other pairs of consonants which are close in articulation and differ by one other feature but are nevertheless frequently members of the same phoneme
[lɹ] [cɡ] [tθ] [dð]

In languages where voicing is non-contrastive, two phones differing in voicing and only slightly in place of articulation might be considered similar eg. [cɡ] etc.)

Further, for the purposes of this type of analysis, the place of articulation of the apicodental fricatives [θ,ð] is considered to be close enough to that of the alveolar stops [t,d] to be considered phonetically similar.

v) Any two vowels differing in only one feature or articulated with adjacent tongue positions
[æ ɐ] [i ɪ] [ɐː ɐ] [i y] [ɑ ɑ̃]

Although it is implied above that the notion of "phonetic similarity" is in some way less linguistically abstract (more phonetic?) than the notion of complementary distribution, it is, nevertheless, a quite abstract concept. The are no obvious and consistent acoustic, auditory or articulatory criteria for phonetic similarity. Further, since a notion of similarity implies a continuum the following question must be asked of two phones in complementary distribution. How similar must they be before they are to be considered members of the same phoneme?

There are many examples of very similar phones which are perceived by native speakers to belong to separate phonemes. In English, for example, a word terminal voiceless stop may be either released and aspirated or unreleased. The homorganic (1) voiced stop may also be released or unreleased. Often the unreleased voiced and voiceless stops may actually be identical in every way except that the preceding vowel is lengthened before the phonologically voiced stop. In terms of phonetic similarity, the two unreleased stops may actually be identical and yet be perceived by native speakers to belong to different phonemes.

For example:-
/kɐp/→[kɐpʰ] ... [kɐp̚]
/kɐb/→[kɐˑb] ... [kɐˑb̚] ... [kɐˑp̚]
(nb. " ̚ " means unreleased stop and " ˑ " means partially lengthened vowel)

Conversely, phones which are very dissimilar (at least from certain perspectives) may be felt by native speakers to belong to a single phoneme.

eg. Japanese(2) /h/ [ɸ] before /u/ eg.[ɸuku] "luck"
  [ç] before /i/ eg.[çito] "man"
  [h] before /e,a,o/ eg.[hana] "flower"

From an articulatory perspective, these phones seem very dissimilar (bilabial, palatal, and glottal) being produced at the extreme ends of the vocal tract. They are, however, relatively similar acoustically and auditorily (they are all relatively weak voiceless fricatives). This kind of phonetic similarity is listener orientated rather than speaker orientated.

eg. English /t/ [ʔ] medially and finally in some dialects
eg. Cockney - "butter", "wait"
  [t] initially
nb.   /k/ [k,ʔ] does not occur although they are articulatorily closer

Phonemic pattern

A pair of phones in complementary distribution may sometimes be classified into separate phonemes on the basis of phonemic pattern. In other words, is there a group of phonemes which exhibit a similar pattern of distribution (eg. clustering behaviour, morphology, etc.) to one of the phones being examined? In the case of the pair [h], [ŋ] there are some similarities in patterning between [h] and certain fricatives, and between [ŋ] and the nasals.

For example, there is a prefix which when placed before a word commencing with a stop has the effect of negating the original meaning. The prefix has the form /ɪ/ plus the nasal homorganic with the stop.

ie. "impossible" [ɪmp...]
  "intolerable" [ɪnt...]
  "incalculable" [ɪŋk...] or [ɪnk...]
  (free variation in citation form, but homorganic predominating in rapid speech)

Clearly, this pattern suggests that [ŋ] behaves in some instances with the same phonological pattern as the other nasals. It does in fact raise the question of [ŋ] being an allophone of /n/. This was indeed the case until the 1600's, but now there are quite a few minimal pairs which have since crept into the language. ("sin"/"sing", "run"/"rung").

Phonological space

The greater the distance between a phoneme and its nearest neighbours, the greater the scope for allophonic variation. In other words, the larger the number of redundant features (ie. features which when changed will not create another phoneme) the greater the number of allophones which can actually occur.

eg. English /p/ [-voice]

(nb. + indicates that a feature is present, - indicates that a feature is absent, +/- indicates that a feature is optional)

Changing the feature [-voice] to [+voice] will create /b/, changing the feature [bilabial] will create /t,k/ (or potential allophones of them) and changing the feature [stop] will create /w,f,m/. The only feature with complete freedom of movement is aspiration, and variation of this feature does indeed create the main pair of allophones of this phoneme in English.

eg. English /r/ [ɹ] alveolar approximant
  [ɹ̥] voiceless alveolar approximant (after voiceless sounds)
  [ɻ] retroflex approximant (West England)
  [ɾ] alveolar flap (Scottish) eg. [ɡɾɪn]
  [ʁ] uvular fricative (Tyneside)

The possible varieties of /r/ seem to include variations of manner, place and voicing. The only restrictions are that its allophones may not overlap with those of /l/ and /w/.

The premises of practical phonemics

(This section is after Pike (1947) (chapter 4, pp 57-66), all text below in quotes has been taken from this source)

This section examines some of the basic assumptions behind phonemic analysis. The first four premises are particularly important to remember during the process of phonemic analysis.

"Phonemic analysis cannot be made with phonetic data alone; it must be made with phonetic data plus a series of phonemic premises and procedures".(p65)

"Phonemic procedures... must be founded upon premises concerning the underlying universal characteristics of languages of the world... ." (p57)

1. "Sounds tend to be modified by their environments" (coarticulation, producing allophones)

The actual details of these processes vary from language to language.

2. "Sound systems have a tendency towards phonetic symmetry"

eg. IF unequivocal evidence that [p] vs [b] and [k] vs [ɡ] are separate phonemes then it is likely that [t] vs [d] are separate phonemes

3. "Sounds tend to fluctuate"

Free variation of allophones, eg. sometimes /tas/ = [tas] and sometimes /tas/ = [das]

4. "Characteristic sequences of sounds exert structural pressure on the phonemic interpretation of suspicious segments or suspicious sequences of segments"

For example, in the interpretation of syllable structure:-

  eg1. [ma] "cat"
  [bo] "to run"
  [su] "sky"
  [sa] "leaf"
  [ia] "moon"
  [tsa] "ten"

If in all non-suspicious words the syllable structure was found to be CV then

  [ia] /ja/
  [tsa] / t͡sa/

which would agree with the CV structure.

  eg2. [maba] "dog"
  [nasaɡ] "elephant"
  [saplam] "egg"
  [pasak] "to eat"

All clear syllable initials are found at the start of the words and are always $CV... . All clear syllable finals are found at the end of words and are either ..VC$ or ..V$. There are no unambiguous examples of CC clusters at the start or end of a syllable therefore the most likely analysis would be to place the syllable boundary in [saplam] thus /sap$lam/. In the cases of [maba],[nasaɡ] and [pasak] the most satisfactory syllabification would be to place the medial consonant in the second syllable (placing at the end of the first syllable would require an additional syllable initial $V... which is not unambiguously attested (ie. no words begin with a vowel)).

Some extra premises (Pike lists more but these are the most important)

1. "Every language has consonants and vowels"

2. "Certain kinds of segments may be vowels in one language but consonants in another."
eg. [ia] →/ia/ in language 1 (L1) but [ia] = /ja/ in language 2 (L2)

3. "The dichotomy between vowel and consonant is not strictly an articulatory one but is in part based on distributional characteristics."

4."A long vowel or consonant may in some languages constitute two phonemes."
eg. [aː] →/a/ in L1 and /aa/ in L2

5. "A sequence of two segments may in some languages constitute a single phonetically complex phoneme."
eg. [atsa] →/at$sa/ in L1 and /atsa/ in L2 (nb. $ = syllable boundary)
It may be that L2 only allows open syllables (V and CV) and so the L1 form would be illegal.

6. "Some segments may be non-significant transition sounds"
eg. in English /eɡ/ may be [ʔeɡ], where the glottal stop is phonemically non-significant.

7. "If two segments are sub-members of a single phoneme, the NORM of the phoneme is that sub-member [allophone] which is least limited in its distribution and least modified by its environments."
eg. /n/ → [ŋ] /__ {k/ɡ} and [n] elsewhere (here, [ŋ] is clearly an environmental modification)

8. "In order to be considered sub-members of a single phoneme, two segments must be (a) phonetically similar and (b) mutually exclusive as to the environments in which they occur."

9. "When two phonemic conclusions each appear to be justifiable by the other premises, and each seems to account for all the available facts of all types, that conclusion is assumed to be correct (a) which is the least complex, and (b) which gives to suspicious data an analysis parallel with analogous non-suspicious data, and (c) which appears most plausible in terms of alleged [coarticulations in] specific environments.


The following books/papers were referred to but aren't required reading.

Hockett, C.F. (1942) "A System of Descriptive Phonology", Language, 18(1), 3-21

Pike, K.L. (1947) Phonemics, U.Michigan

Trubetzkoy, N.S. (1939) "Grundzüge der Phonologie". Travaux du Cercle Linguistique de Prague 7, Reprinted 1958, Göttingen: Vandenhoek & Ruprecht. Translated into English by C.A.M.Baltaxe 1969 as Principles of Phonology, Berkeley: University of California Press.


1. Homorganic sounds are two or more sounds that have the same place of articulation but which differ in manner of articulation.

2. This pattern reflects Japanese in the mid 20th century. This pattern has undergone recent change.

Content owner: Department of Linguistics Last updated: 12 Mar 2020 12:18pm

Back to the top of this page