Rule-based modelling of non-native pronunciation variants
for speech technology applications

Note: This text was written in 2002/2003, so it may be outdated in some aspects. For more recent information related to this research, see my full list of publications.

Project summary

Pronunciation dictionaries are a crucial component of speech recognition and speech synthesis systems, as they form the link between the acoustic and symbolic level of automatic speech and language processing. Typically, each entry in a lexicon is assigned a phonetic transcription that represents its canonical form, i.e. its standard pronunciation in the language the system is designed for. Canonical lexicons, however, have the general drawback that every marked deviation from the standard form will lead to a mismatch between lexicon transcription and actual pronunciation. In Automatic Speech Recognition (ASR), this may cause a significant decline of the recognition performance. In recent years, a number of approaches to compensate for this mismatch by various lexical adaptation techniques have been proposed, e.g. by adding alternative pronunciation variants to the lexicon, by generating these variants using phonological rules, or by building pronunciation networks. Usually these techniques are applied to model frequently occurring intra-lingual variations such as within-word or cross-word assimilations or elisions in informal speech.

It is the aim of this research project to extend the lexicon adaptation approach from intra-lingual variation to the domain of foreign-accented pronunciation. Non-native speakers frequently produce variants that deviate markedly from the canonical form. They are characterized by phenomena such as changes in allophonic realizations, phoneme shifts, word stress shifts, and even alternations in syllable structure caused by epenthesis or deletion of speech sounds. One source (among others) of these mispronunciations is a transfer of phonetic elements and rules from the speaker's native language onto the target language.

The idea to model these errors by lexicon adaptation is based on the assumption that for each language direction - i.e. a native language (L1) and target language (L2) pair - a number of characteristic pronunciation errors can be identified. Although there is a considerable range of inter-individual variation even for speakers with the same native language background (due to variables such as L2 proficiency, age, education, dialectal origin, etc.), it is assumed that common mispronunciations can be formulated as rewrite rules to generate prototypical interlanguage transcriptions.

Currently, the languages investigated are German (GER), English (ENG), and French (FR) in different L1/L2 combinations; an extension to additional languages is envisaged. A prototype of a task-specific rule interpreter was implemented. Phonological rule sets for the language directions ENG -> GER, GER -> FR, GER -> ENG, and FR -> GER were developed and are constantly being updated and modified. These rules are based on actual pronunciation variants observed in a non-native speech database. The research is currently limited to the domain of foreign city names; yet it is expected that the findings can be generalized to other lexical domains. For the purposes of this project, a speech data collection of non-native speech was built up. It includes non-native pronunciation variants of city names/town names from five European languages (English, German, French, Italian and Dutch) spoken by native speakers of English, German, French, Italian, and Spanish. In order to account for potential inter-speaker variability, at least 20 speakers per native language were recorded. The recordings included both a reading task and a repetition task, using the same words for both tasks. This allows to spot the particular influence of spelling pronunciation on the production of the speakers. The database and the experimental setting for the recordings are described in full detail in Schaden (2002).

In its current status, the rule system includes sets of postlexical accent rules for English, French, and German. The number of rules for each language direction is approx. 80-100. The rules generate several prototypical foreign-accented variants per input word, using phoneme substitution rules that make use of various types of linguistic information, as described in Schaden (2003).


Schaden, S., 2002. "A Database for the Analysis of Cross-lingual Pronunciation Variants of European City Names." Proceedings Third International Conference on Language Resources and Evaluation (LREC 2002), Las Palmas de Gran Canaria, Spain, Vol. 4, 1277-1283.

Schaden, S., 2003. "Generating Non-Native Pronunciation Lexicons by Phonological Rules". Proceedings 15th International Conference of Phonetic Sciences (ICPhS 2003), Barcelona, Spain.