TOP > Research > Department of Systems and Social Informatics > Department of Media Science > Speech and Image Science Group > KITAOKA, Norihide

Comprehensive List of Researchers "Information Knowledge"

Department of Media Science

KITAOKA, Norihide
Speech and Image Science Group
Associate Professor
Dr. of Engineering
Research Field
Speech recognition / Spoken dialog

Current Research

Speech/Spoken Language Processing
Almost all the humans use spoken dialog, which is the most natural communication method. If we can recognize/manage/synthesize speech in computers, this speech can be not only the best method of communication but can also be used as data storage media.
I am engaged in technologies on spoken language.
■Noisy speech recognition
Degradation of speech recognition performance is problematic in practical speech systems. Standard evaluation frameworks for noisy speech recognition are very useful for comparison of many noise reduction methods. I am the leader of the developers' group for the standardized evaluation framework series (CENSREC), which contains data, recognition tools, and evaluation tools, and the frameworks distributed freely in public.
■Large vocabulary continuous speech recognition
Making transcriptions of monologues such as lectures is a very promising research area. We improve acoustic modeling of the human voice using models such as the Hidden Markov Model (HMM) and statistical language modeling (N-gram). We also improve the decoding algorithm.
■Spoken dialog interface (1) -for a friendly interface-
The first impression of a spoken dialog system for novice users is that it is `impassive', because the time-lag between a human utterance and the system reply is too long and as such the user cannot distinguish whether or not the system works. This is one of the reasons why users do not feel that spoken dialog systems can be used in a comfortable, frendly manner.
Thus, we focus on prosodic features like timing and pitch change in a dialog. Our dialog system has begun to speak with appropriate prosodic features considering previous user utterances. When the dialog gets `lively,' the pitch of the system utterances chase the user's pitch.
On the other hand, we also study a semantic dialog strategy. We are now developing a robust and natural response generation method in a system that considers its own misunderstandings.
■Spoken dialog interface (2) -Automatically responding...-
A system that works only when the user would like it to... This system is always silently near the user, but when the user wants to use to talk to it, the system responds naturally and works. To realize such a system, various cues such as user orientation, change of speaking style, and contents of user utterance, etc., are employed.
■Multimodal interface
For a mobile information terminal, we are developing multimodal interfaces using speech/spoken dialog inputs. The combinations of speech, touch-pen, touch panel, and finger pointing, for example, are promising.
■Summarization/Indexing of spoken documents
Nowadays, we can store a huge amount of video/audio contents. In such data, most linguistic information is included as speech. Summarization and indexing using speech in the contents helps the application of the contents. Our challenge is to develop robust speech recognition, summarization, and indexing techniques.


  • Norihide Kitaoka received his B. S. and M. S. degrees from Kyoto University. In 1994, he joined DENSO CORPORATION.
  • In 2000, he received his Ph. D degree from Toyohashi University of Technology (TUT).
  • He joined TUT as a Research Associate in 2001 and was a Lecturer from 2003 to 2006.
  • Since 2006 he has been an associate professor in Nagoya University.

Academic Societies

  • The Acoustical Society of Japan
  • The Institute of Electronics
  • Information and Communication Engineers
  • Information Processing Society of Japan
  • The Japanese Society for Artificial Intelligence


  1. Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs, IEICE Trans. Inf. & Syst., Vol. E91-D, No. 3, pp. 411-421, 2008.
  2. A Spoken Dialog System for Chat-like Conversations Considering Response Timing, Text, Speech, and Dialogue, pp. 599 -606, Springer, Sep. 2007.
  3. Robust distant speaker recognition based on positiondependent CMN by combining speaker-specific GMM with speaker-adapted HMM, Speech Communication, Vol. 49, Issue 6, pp. 501-513, 2007.