en-de
en-es
en-fr
en-sl
en
en-zh
0.25
0.5
0.75
1.25
1.5
1.75
2
Webpage Understanding: an Integrated Approach
Published on Sep 14, 20076582 Views
Recent work has shown the effectiveness of leveraging layout and tag-tree structure for segmenting webpages and labeling HTML elements. However, how to effectively segment and label the text contents
Related categories
Chapter list
Webpage Understanding: an Integrated Approach00:03
Outline00:32
Motivating Examples00:50
Characteristics of Webpage02:10
Tasks of Web Data Extraction03:39
slide 604:49
Existing Attempts – De-coupled Approaches04:57
Disadvantages05:36
Why no integrated approach?06:16
Outline07:10
Statistical Web Structure Mining Model (KDD 2006)07:33
Integrated Webpage Understanding Model08:50
Factorized Distribution09:59
Separate Learning13:04
Outline13:26
Experiments13:33
Extraction Accuracy14:20
NP-Chunking Features15:02
Conclusions & Future Work15:35