Towards Combining Web Classification and Web Information Extraction: A Case Study
Description
Web content analysis often has two sequential and separate steps: Web Classification to identify the target Web pages, and Web Information Extraction to extract the metadata contained in the target Web pages. This decoupled strategy is highly ineffective since the errors in Web classification will be propagated to Web information extraction and eventually accumulate to a high level. In this paper we study the mutual dependencies between these two steps and propose to combine them by using a model of Conditional Random Fields (CRFs). This model can be used to simultaneously recognize the target Web pages and extract the corresponding metadata. Systematic experiments in our project OfCourse for online course search show that this model significantly improves the F1 value for both of the two steps. We believe that our model can be easily generalized to many Web applications.
| Slides | |
| 0:00 | Towards Combining Web Classification and Web Information Extraction: a Case Study |
| 0:24 | Web Content Analysis for Vertical Search |
| 1:29 | OfCourse |
| 2:23 | Web Classification and Web Information Extraction |
| 3:10 | Contributions |
| 4:57 | Motivating Examples (1) |
| 5:51 | Motivating Examples (2) |
| 6:31 | Problem Formulation (1) |
| 7:17 | Problem Formulation (2) |
| 7:38 | The Graphical Model (1) |
| 7:51 | The Graphical Model (2) |
| 8:11 | The Graphical Model (3) |
| 8:42 | The Graphical Model (4) |
| 9:02 | Expressing the Conditional Probability |
| 9:12 | Parameter Learning |
| 9:32 | Model Inference with Constrained Output (1) |
| 10:17 | Model Inference with Constrained Output (2) |
| 10:50 | Baseline Methods |
| 10:54 | Experimental Results |
| 11:16 | Conclusions and Discussion |
| 12:06 | OfCourse |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





interesting work~