<< Takaisin listaan
Väyrynen P (2005)
Perspectives on the utility of linguistic knowledge in English word prediction
Dissertation, Acta Univ Oul Humaniora B 67, Faculty of Humanities, University of Oulu, Finland
The problem addressed in the present thesis is the utility of linguistic knowledge in one domain of language technology, word prediction. An important characteristic of any practical language technology application is its level of performance, and it is therefore essential to be able to measure this quantitatively. The main questions in the present thesis are the following: (1) how can a significant improvement in performance be obtained in practical language technology products, and (2) what is the cost of improved performance in terms of the sources of linguistic knowledge that should be incorporated in them? On a more general level, the major findings suggest that the practical utility of linguistic knowledge in language technology should generally be evaluated from at least three larger perspectives: (1) language, (2) technology, and (3) the user of the application. From these three perspectives, a variety of constraints can be identified which either increase or decrease the usefulness of linguistic knowledge in practical language technology applications. A statistical state-of-the-art word prediction system was developed and tested in the empirical part of this work, and testing the performance of a few prediction methods that utilise sources of linguistic knowledge showed that they can perform just as well as some existing state-of-the-art statistical prediction methods. When the syllable-initial characters of the words to be predicted were used, for example, the expected length of the search key in a running text with a prediction list of ten tokens was only 1.59 characters, while the use of information on the parts of speech of the word tokens to be predicted in a system with five lists representing five parts of speech resulted only in a three percent improvement in performance. One of the practical implications of these results for the field of language technology is that a significant improvement in the performance of a word prediction system may be achieved only incrementally. The simultaneous use of several techniques may in turn dilute the real-time operation of the prediction system, so that it is unable to suggest candidate words quickly enough for the user. It can also affect some performance aspects such as the average percentage of keystrokes/characters saved.