Cocktail Party Problem as Binary Classification
published: July 30, 2009, recorded: June 2009, views: 6633
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Speech segregation, or the cocktail party problem, has proven to be extremely challenging. Part of the challenge stems from the lack of a carefully analyzed computational goal. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon, we have suggested instead the ideal time-frequency (T-F) binary mask as a main goal for computational auditory scene analysis. Ideal binary masking retains the mixture energy in T-F units where the local signal-to-noise ratio exceeds a certain threshold, and rejects the mixture energy in other T-F units. Recent psychophysical evidence shows that ideal binary masking leads to large speech intelligibility improvements in noisy environments for both normal-hearing and hearing-impaired listeners. The effectiveness of the ideal binary mask implies that sound separation may be formulated as a case of binary classification, which opens the cocktail party problem to a variety of pattern classification and clustering methods. As an example, I discuss a recent system that segregates unvoiced speech by supervised classification of acoustic-phonetic features.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !