Structured Prediction Problems in Natural Language Processing
introducer: William Cohen, Carnegie Mellon University
Description
Modeling language at the syntactic or semantic level is a key problem in natural language processing, and involves a challenging set of structured prediction problems. In this talk I'll describe work on machine learning approaches for syntax and semantics, with a particular focus on lexicalized grammar formalisms such as dependency grammars, tree adjoining grammars, and categorial grammars. I'll address key issues in the following areas: 1) the design of learning algorithms for structured linguistic data; 2) the design of representations that are used within these learning algorithms; 3) the design of efficient approximate inference algorithms for lexicalized grammars, in cases where exact inference can be very expensive. In addition, I'll describe applications to machine translation, and natural language interfaces.
| Slides | |
| 0:00 | Structured Prediction Problems in Natural Language Processing |
| 4:06 | Structured Prediction Problems |
| 5:23 | Examples of Structured Prediction Problems |
| 6:28 | Syntactic Structures (1) |
| 7:03 | Syntactic Structures (2) |
| 7:42 | Structured Prediction Models for Parsing (1) |
| 8:43 | Structured Prediction Models for Parsing (2) |
| 9:00 | Structured Prediction Models for Parsing (3) |
| 9:23 | Structured Prediction Models for Parsing (4) |
| 9:59 | Overview |
| 10:12 | Context-Free Grammars (CFGs) for Language |
| 11:32 | Motivation for Parsing: Grammatical Relations |
| 12:23 | Syntactic Structures and Grammatical Relations (1) |
| 13:19 | Syntactic Structures and Grammatical Relations (2) |
| 13:33 | Syntactic Structures and Grammatical Relations (3) |
| 13:36 | Probabilistic Context-Free Grammars (PCFGs) |
| 15:29 | Overview |
| 15:33 | Models: Key Points |
| 16:27 | Conditional Random Fields |
| 16:58 | The Building Blocks for CRFs: Feature Vectors |
| 18:08 | Conditional Random Fields |
| 19:03 | Generalizing CRFs to Parsing |
| 19:34 | A TAG-Style Formalism |
| 20:41 | A Combination Operation: Sister Adjunction (1) |
| 21:32 | A Combination Operation: Sister Adjunction (2) |
| 21:41 | A Combination Operation: Sister Adjunction (3) |
| 22:12 | The Decomposition into Spines and Adjunctions |
| 22:36 | Advantages of TAG |
| 23:59 | The Contrast with Context-free Grammars |
| 24:18 | Features on Adjunctions (1) |
| 25:18 | Features on Adjunctions (2) |
| 25:21 | Features on Adjunctions (3) |
| 25:24 | Features on Adjunctions (4) |
| 25:29 | Features on Adjunctions (5) |
| 25:31 | Features on Adjunctions (6) |
| 25:59 | Features on Adjunctions (7) |
| 26:00 | A TAG-Based Model |
| 26:50 | Experiments |
| 27:41 | Probabilistic Context-Free Grammars (PCFGs) |
| 28:05 | PCFGs with Parent Annotations |
| 28:49 | Lexicalized PCFGs |
| 29:39 | PCFGs with Latent Variables |
| 30:26 | A Comparison to PCFGs (1) |
| 30:51 | A Comparison to PCFGs (2) |
| 31:19 | A Comparison to PCFGs (3) |
| 31:35 | Overview |
| 31:51 | An Application: Translation |
| 32:33 | Phrase-based Translation (Och et al., 1999) (1) |
| 33:12 | Phrase-based Translation (Och et al., 1999) (2) |
| 33:34 | Phrase-based Translation (Och et al., 1999) (3) |
| 34:42 | Translation as Parsing (1) |
| 35:31 | Translation as Parsing (2) |
| 35:51 | Translation as Parsing (3) |
| 36:15 | Translation as Parsing (4) |
| 36:22 | Properties of Translation as Parsing |
| 36:28 | Translation as Parsing (4) |
| 36:31 | Properties of Translation as Parsing |
| 37:25 | Overview |
| 37:28 | Inference |
| 37:45 | Inference: Key Points |
| 39:10 | Dependency Structures |
| 40:53 | Efficient Parsing Algorithms (Eisner 1997, 2000) |
| 42:00 | TAG Parses and Dependency Structures |
| 42:40 | Applying Eisner’s Algorithms to our Formalism |
| 43:20 | Coarse-to-fine Dynamic Programming |
| 44:02 | Three Simple Dependencies In Every Adjunction |
| 45:15 | Effect of the Beam (Validation Data) |
| 45:58 | Overview |
| 46:04 | Conditional Random Fields |
| 46:20 | Efficiency is a Key Problem |
| 47:13 | Parameter Estimation: the Structured Perceptron |
| 49:02 | Parameter Estimation: Averaging |
| 50:02 | Properties of the Perceptron |
| 51:33 | Averaged Perceptron Convergence |
| 52:04 | Regularized Log-Likelihood for Training CRFs |
| 53:17 | The Dual |
| 55:16 | Dual Coordinate Descent |
| 56:30 | An Exponentiated Gradient (EG) Algorithm |
| 58:06 | Properties of the EG Algorithm |
| 58:43 | An Exponentiated Gradient (EG) Algorithm |
| 58:48 | Properties of the EG Algorithm |
| 60:45 | Comparison to L-BFGS on a Parsing Problem (Objective Value) |
| 61:14 | Comparison to L-BFGS on a Parsing Problem (Accuracy) |
| 61:35 | Conclusions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





