Humanism and Technology
Dr. Juhani Toivanen,
Although current statistical methods used in different areas of language technology may already solve many problems in everyday life, computers do not really understand natural language yet. The mere possibility of using natural spoken or written language in the interaction between humans and computers noticeably increases the usability of different applications. Automatic speech and text recognition are key technologies in this kind of interfaces: in the future, computers could, for example, recognize the emotions and attitudes of their users directly from the acoustic structure of speech.
Thematically and scientifically, the establishment of MediaTeam’s Language and Audio Technology team in 1999 was a response to the urgent need for an integrated approach to the research of digital databases of text and audio material. From the very beginning, the aim of the team’s research was to utilize and develop methods for pattern recognition, digital signal processing and the automatic interpretation, recognition and processing of speech and audio signals and text within the framework of artificial intelligence. The most important application areas still include new search engines for movie and sound recordings databases, as well as computer tools for the design and description of audiovisual content. The results of MediaTeam's research in this field have not gone unnoticed: Dr. Juhani Toivanen, a member of the team, was appointed to the post of Academy of Finland Research Fellow for August 2004 - July 2009. In his research, Toivanen studies the vocal correlates of emotions in spoken Finnish and their multi-parametric prosodic analysis, broadening and deepening his earlier research on phonetic and linguistic correlates in emotions conveyed by Finnish speech.
The Language and Audio Technology team was established in its initial form when Professor Tapio Seppänen, who had defended his doctoral dissertation at the University of Oulu some ten years earlier, left VTT Electronics to rejoin the ranks of the Faculty of Technology in the University of Oulu. At VTT Electronics Professor Seppänen had worked as a senior and chief research scientist in 1993-1998 and as the head of the Electronic Circuits and Systems division in 1998-1999.
The mission of the Language and Audio Technology team was essentially cross-disciplinary; it was therefore necessary to recruit a group of promising researchers representing different areas and interests. In 1999, Ilkka Juuso, Kai Noponen and Eero Väyrynen joined the team as young students of electrical engineering. Pertti Väyrynen, who held an M.A. degree in English Philology at the time, also joined in. Väyrynen had a background in linguistics and considerable experience in language technology: his company, PhoneSoft, produced translation software systems. Dr. Juhani Toivanen made his first contribution in 2000 as a part-time researcher; he had a background in English Philology and language teaching, and had also worked as a university lecturer in phonetics and general linguistics. The funding of the team’s research was – and still is – almost entirely dependent on outside sources, such as the National Technology Agency and the Academy of Finland, significantly backed up by funding from industry.
Currently, the Language and Audio Technology team consists of one professor, five full-time researchers and a part-time researcher. It participates actively in the Osaava Pohjois-Suomi (Capable Northern Finland) project, a large undertaking of northern universities and polytechnics which aims to promote cross-disciplinary research in Northern Finland within key areas such as wellness technology, ICT, and travel industry. MediaTeam’s contribution – the development of new algorithms and software for digital signal analysis and classification – is significant: a lexicon and phraseology system comprising of several languages and genres relating to health issues and enabling an automatic translation via wireless devices would be a great asset in a clinical context.
Related MediaTeam Projects
Prosody of Emotions1/2003-12/2006
The Prosody of Emotions project was a part of a more extensive program called “Multidisciplinary research project on the expression of emotion in spoken Finnish” that consisted of three partner departments and laboratories from the Helsinki University of Technology, the University of Oulu and the University of Tampere.
The project aimed to deepen and broaden the linguistic research of emotional correlates in spoken language and find potential targets of application, especially in the field of information technology. A specific aim was to investigate the phonetic and phonological correlates that convey emotions in spoken Finnish and to develop pattern recognition methods for classification of emotional speech signals. The top-level goal of this research was to develop the first systematic descriptive model for spoken Finnish that would take several linguistics layers into account: the aim was to create a theoretical model which could be utilized in multi-dimensional speech corpuses and search engines. In the project, speech signals were analyzed both instrumentally (with speech algorithms) and auditorily to investigate their phonological structure. The results of the project basically determined, for the first time, to which degree the automatic classification of emotion from spoken Finnish is attainable.
In connection with this project, the MediaTeam Emotional Speech Corpus was collected. The database is the largest emotion speech corpora for spoken Finnish, and also one of the largest globally. Also in connection with the project, the f0Tool – a speech analysis software utilizing MATLAB – was developed. The f0Tool has been verified to be a highly reliable software package for automatic prosodic analysis of large quantities of speech data; it can also be used on speech data produced in very demanding conditions.
Financiers and Business Partners
- The Academy of Finland
Semantic Gap was a joint project of MediaTeam and the Department of Information Studies in the Faculty of Humanities focusing on the indexing of databases and content-based retrieval of audio and video recordings. Thematically, the project was closely connected with the Vikings project, and its results were extensively tested and applied in the Vikings project.
The central aim of the project was to narrow down the semantic gap between the concept-based and content-based approaches to database indexing. By narrowing the semantic gap, it would be possible to design more and more efficient databases and search engines. The research challenges concerned booming media types, such as digital speech, music, and image, where search criteria often included semantic concepts.
The research questions represented the interface between technology and semantic/cognitive information science, and only a genuinely cross-disciplinary team could hope to tackle the problems. Eventually, the undertaking turned out to be highly successful, as the results were efficiently utilized – the search engine was benchmarked in the international VideoTREC competition, an annual conference series sponsored by the National Institute of Standards and Technology and other U.S. government agencies.
Financiers and Business Partners
- The Academy of Finland
The Vikings project was carried out in cooperation with VTT Electronics. In the project, new content-based retrieval systems for searches in movie and sound recording databases were developed.
The project’s goals were the development of methods required in content-based multimedia retrieval, the development of novel language technology and the testing of this technology in service applications. Key technologies included digital signal processing, digital image analysis, pattern recognition, visualization, and search engine technology.
The researchers developed new artificial intelligence technologies, by means of which it was possible to detect the emotional state of speakers (with a focus on the Finnish and English languages) from the speech signal almost as automatically and successfully as people do. New image processing techniques were also developed for interpreting video content: changes in color contents in the spatial and temporal domains were measured, and the images were classified accordingly. Finally, the algorithms were integrated into a search engine that combined the audio and video features to achieve higher-level semantic presentations.
Financiers and Business Partners
- OPOY/Finnet Group
- National Technology Agency
Toivanen J, Waaramaa T, Alku P, Laukkanen A-M, Seppänen T, Väyrynen E, Airas M (2006) Emotions in /a:/: a perceptual and acoustic study. Logopedics Phoniatrics Vocology, 31: 43-48.
Toivanen J & Waaramaa T (2005) Tone choice and voice quality of dispreferred turns in the English of Finns. Logopedics Phoniatrics Vocology, 30: 181-184.
Toivanen J, Seppänen T, Väyrynen E (2004) Automatic discrimination of emotion from spoken Finnish. Language and Speech, 47: 383-412.
Iivonen A, Seppänen T, Noponen K & Toivanen J (2004) Puhujan temporaalisen äänialan visualisointisovellus. Puhe ja kieli 24(1):5–16 (in Finnish).
Toivanen J (2004) Pitch dynamism of English produced by proficient non-native speakers: preliminary results of a corpus-based analysis of second language speech. Proc. FONETIK 2004, Stockholm, Sweden, 48–51.
Seppänen T, Toivanen J & Väyrynen E (2003) MediaTeam Speech Corpus: a first large Finnish emotional speech database. Proc. 15th International Congress of Phonetic Sciences, Barcelona, Spain, 3:2469-2472. Details
Suomi K, Toivanen J, Ylitalo R (2003) Durational and tonal correlates of accent in Finnish. Journal of Phonetics, 31: 113-138. Details
Toivanen J & Seppänen T (2002) Prosody-based search features in information retrieval. Proc. Fonetik 2002, Stockholm, Sweden, 105-108.
Väyrynen P, Seppänen T, Noponen K & Juuso I (2002) On the usefulness of linguistic knowledge in different areas of application in language technology. Informaatiotutkimus 3/2002: 59-66 (in Finnish). Details
Väyrynen P, Peltola J & Seppänen T (2000) Enhancing phoneme recogniser performance with a simple rule-based language model. Proc. STeP 2000 - Finnish Artificial Intelligence Days, Espoo, Finland, 171-178. Details
Väyrynen P (2005) Perspectives on the utility of linguistic knowledge in English word prediction. Dissertation, Acta Univ Oul Humaniora B 67, Faculty of Humanities, University of Oulu, Finland. Details
Related Master's Theses
Väyrynen E (2005) Automatic emotion recognition from speech. M.Sc. thesis, Department of Electrical and Information Engineering, University of Oulu, Finland. Details
< Previous Chapter Next Chapter >