event thumbnail image
2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms

Microphone Array Driven Speech Recognition: Influence of Localization on the Word Error Rate

author: Matthias Wolfel, University of Karlsruhe

Description

Interest within the automatic speech recognition research community has recently focused on the recognition of speech where the microphone is located in the medium field, rather than being mounted on a headset and positioned next to the speakers mouth to realize the long-term goal of ubiquitous computing. This is a natural application for beamforming techniques using a microphone array. A crucial ingredient for optimal performance of beamforming techniques is the speaker location. Hence, to apply such techniques, a source localization algorithm is required. In prior work, we proposed using an extended Kalman filter to directly update position estimates in a speaker localization system based on time delays of arrival.We also have enhanced our audio localizer with video information. In this work, we investigate the influence of the speaker position on the word error rate of an automatic speech recognition system operating on the output of a beamformer, and compare this error rate with that obtained with a close talking microphone. Moreover, we compare the effectiveness of different localization algorithms. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that accurate speaker tracking is crucial for minimizing the errors of a farfield speech recognition system.

You might be experiencing some problems with Your Video player.
Slides
0:00 Microphone Array Driven Speech Recognition: Influence of Localization on the Word Error Rate
0:17 The CHIL Seminar Room Layout at the Universität Karlsruhe (TH)
1:08 Components
2:30 Data Fusion with Particle Filter
3:27 Speaker Localization: Audio Features
4:23 Images from Calibrated Cameras
4:52 Speaker Localization: Video Features (I)
5:27 Speaker Localization: Video Features (II)
6:09 Position Estimation
7:05 Video of Tracking
8:05 3D Head Position Estimation Error
9:44 Word Error Rates
11:20 Average Position Error vs. Word Error Rate
12:02 Questions or Comments are Welcome

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: