Dynamic Time Warping’s New Youth
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Before the use of Hidden Markov Models (HMM) became ubiquitous in speech‐related applications, pattern matching algorithms like the well known Dynamic Time Warping (DTW) algorithm  were extensively used for applications such as spoken keyword recognition . At the time, the main drawbacks of this technology were its computational cost (given the machinery available at the time) and the lack of generalization when matching acoustic sequences from different speakers or different acoustic contexts. The availability of labeled datasets used for training pushed pattern matching techniques aside in favor of HMMs. Still, HMMs have several well known weaknesses, such as overgeneralization given the training data, lack of robustness to changing noise conditions and the need to have large corpora of well‐labeled training data, limiting their suitability for some speech applications. For this reason, recently some research groups started to look again at DTW as a plausible alternative, and worked on smoothing those issues that made it unsuitable in the past. On the one hand, new acoustic features are being researched  to make the matching as independent as possible to the speaker, while keeping the content. On the other hand, although computing power is much improved from the 70’s, DTW several enhancements have been proposed [4,5] in order to allow for more challenging tasks than in the past. Some of the tasks where pattern‐matching (and in particular DTW) approaches are currently applied are: automatic discovery of repeated patterns in speech, query‐by‐example voice search, pattern‐based speech recognition and low‐resource languages analysis.
 H. Sakoe and S. Chiba, “Dynamic programming algorithm optimiza‐ tion for spoken word recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, pp. 43–49, 1978.
 Alan L. Higgins and Robert E. Wohlford, “Keyword recognition using template concatenation,” in In Proc. ICASSP 1985, 1985.
 G. Aradilla, “Using Posterior‐Based Features in Template Matching for Speech Recognition,” in ICSLP, 2006.
 X. Anguera, R. Macrae, and N. Oliver, “Partial Sequence Matching using an Unbounded Dynamic Time Warping Algorithm,” ICASSP, 2010.
 A. Jansen and B. V. Durme, “Efficient Spoken Term Discovery Using Randomized Algorithms,” in ASRU, 2011.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !