The phonemes and visemes in the AVOZES data corpus were put in central position in CVC- or VCV-contexts (VCV = vowel-consonant-vowel) to be free of any phonological or lexical restrictions. However, wherever possible, existing English words (that follow these context restrictions) were favoured over nonsense words in order to simplify the familiarisation process of the speakers with the speech material. The vowel context for VCV-words was the wide open /bɑː/ ("arCar). The voiced bilabial /b/ was used as the consonant context ("bVb") for CVC-words. The opening and closing of a bilabial viseme clearly marks the beginning and end of the vocalic nucleus, and thus facilitates the visual analysis. Using /b/ instead of /p/ lengthens each word, giving more data to analyse.
A disadvantage of the /bVb/ context is that a bilabial context causes strong coarticulation effects in the formants. However, these are quite predictable for /b/ and we believe that the advantages of a bilabial context for visual segmentation outweigh the disadvantages from coarticulation.
To overcome the typical articulation patterns associated with reading words from a list, each CVC- and VCV-word was enclosed by the carrier phrase "You grab /WORD/ beer." Having a bilabial opening and closing before and after the word under investigation again helps with the visual segmentation process, in particular for the VCV-words. Tables 4 and 5 show the lists of prompts and pronunciation hints, which were presented to the speakers during familiarisation and recording. Each phrase to be read out aloud by the speakers was shown at the top of the prompt message on the screen, and was followed by an example of how to pronounce the phoneme under investigation in that prompt. Here's an example of such a prompt message.

Example of a prompt message on the screen during recording, as viewed
by the spaekers
Two phonemes from the lists in Tables 1 and 2 were omitted (see also prompt lists in Tables 4 and 5) because they have a low occurrence in AuE. These phonemes were /ʒ/ (as in "azure") and /ʊə/ (as in "tour"). It was, therefore, considered to be likely that speakers would not pronounce the prompts correctly. These two phonemes were also rather difficult to achieve in the selected CVC- and VCV-contexts. Furthermore, the neutral vowel /ə/ and the neutral consonant /h/ were not recorded, because it was assumed that they add little to the statistical analysis of relationships between audio and video speech parameters due to their neutrality. In hindsight, it might have been better to also record these four phonemes at the time for completeness, even if speakers had difficulties producing the correct pronunciation. However, these sequences can and may be added to the AVOZES data corpus in future, due to the modular design of the data corpus. During the recordings it also became evident, that some speakers had difficulties in producing distinguished sounds for the voiceless and voiced inter-dental fricatives /θ/ and /ð/, as well as producing the velar closure nasal /ŋ/. The analysis of these sequences must therefore be treated with care.
Note: If your browser does not show the IPA symbols above correctly, please select a Unicode font.
Example Sequence
Note: Any example sequence is provided for informative purposes, so
that you can judge whether AVOZES is the right data corpus for you. You
may use it for internal evaluation purposes only. For all other uses,
including academic research, a licence must be acquired
(non-commercial (academic) licence,
commercial licence).
Download an example sequence (6.9MB, AVI) of "You grab BAB beer."
Download an example sequence (6.9MB, AVI) of "You grab ARGAR beer."
[Homepage] [AVOZES Homepage] [Research]