Online MKL for Structured Prediction
published: Jan. 12, 2011, recorded: December 2010, views: 3589
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Structured prediction (SP) problems are characterized by strong interdependence among the output
variables, usually with sequential, graphical, or combinatorial structure [17, 7]. Obtaining good
predictors often requires a large effort in feature/kernel design and tuning (usually done via crossvalidation).
Because discriminative training of structured predictors can be quite slow, specially in
large scale settings, it is appealing to learn the kernel function simultaneously.
In multiple kernel learning (MKL, ), the kernel is learned as a linear combination of prespecified
base kernels. This framework has been made scalable with the advent of wrapper-based methods,
in which a standard learning problem (e.g., an SVM) is repeatedly solved in an inner loop up to
a prescribed accuracy [14, 11, 6]. Unfortunately, extending such methods to large-scale SP raises
practical hurdles: since the output space is large, so are the kernel matrices, and the number of
support vectors; moreover, it becomes prohibitive to tackle the inner problem in its batch form, and
one usually resorts to online algorithms [12, 3, 13]. The latter are fast learners but slow optimizers
; using them in the inner loop with early stopping may misguide the overall MKL optimization.
We propose instead a stand-alone online MKL algorithm which exploits the large-scale tradeoffs
directly. The algorithm iterates between subgradient and proximal steps, and has important advantages:
(i) it is simple, flexible, and compatible with sparse and non-sparse variants of MKL,
(ii) it is adequate for SP,
(iii) it offers regret, convergence, and generalization guarantees.
Experimentally, the proposed algorithm yields near-optimal solutions faster than wrapper-based approaches (adapted to SP). We use the two following SP experimental testbeds: sequence labeling for handwritten text recognition, and natural language dependency parsing. The potential of our approach is shown in learning combinations of kernels from tens of thousands of training instances, with encouraging results in terms of speed, accuracy, and interpretability.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !