en
0.25
0.5
0.75
1.25
1.5
1.75
2
Domain-Centric Information Extraction
Published on Jul 13, 20123348 Views
Related categories
Chapter list
Domain-Centric Information Extraction00:00
IMDb00:46
yelp01:11
B&H01:14
San Francisco public library01:16
Cornell University library01:18
Scaling IE01:26
Domain-centric Information Extraction - schema03:21
Outline03:40
Part I : Analysis of Data on the Web04:45
Questions04:55
Spread (1)05:52
Spread (2)07:18
Restaurants phones08:04
Restaurants homepages10:08
Aggregate reviews10:42
The long tail of websites11:26
Connectivity11:47
Graph12:26
High degree of redundancy and overlap14:25
Part II : Domain-centric extraction from script-generated sites14:44
A primer on script-generated sites15:19
Wrapper15:43
Domain-centric Extraction16:35
Our Extraction Pipeline18:02
Step 1 : Discover19:13
Step 2 : Cluster20:36
Previous Techniques21:05
Example21:45
Our Approach23:05
Step 3 : Annotate23:56
Step 4 : Extract25:50
Where to buy26:05
Our Approach27:20
Enumeration28:23
Ranking29:05
Components in Ranking29:45
Experiments30:50
XPath wrappers on dealers dataset31:46
XPath wrappers on discography dataset32:37
Conclusions32:57
Thank you33:33