Label | 96mue2 |
Title | Predicting the Out-of-Vocabulary Rate and the Required Vocabulary Size for Speech Processing Applications |
Authors | Johannes Müller, Holger Stahl, Manfred Lang |
Type | Scientific Conference Paper |
Abstract | This paper describes an approach for predicting both the vocabulary size and the resulting out-of-vocabulary rate (OOV-rate) for a hypothetical extension of an existing text corpus. By splitting the original corpus into two different sub-corpora, vocabulary and OOV-rate can be determined for that special constellation. Average values are calculated for all combinations of sub-corpora and can be approximated by analytic function terms. These functions enable the easy prediction of the vocabulary size and the OOV-rate. The prediction accuracy results in a relative error below 4.6%. Keywords: out-of-vocabulary rate, OOV-rate, vocabulary size, text corpus, test corpus, training corpus |
Reference | Proceedings ICSLP 96 (Philadelphia, USA, 1996), pp. 658-661
doi.org/10.1109/ICSLP.1996.608010 |
Published | October 1996 |
Language | English |
Download | Scientific Conference Paper as pdf file (48 kByte) |