published: July 4, 2012, recorded: May 2012, views: 445
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Taxonomies are a useful mechanism to organize, evaluate, and search web content. As such, many popular classes of web applications, from product categorization, similar-product comparative pricing, localized services, to vertical or enterprise search, utilize them. However, their manual generation and maintenance by experts is a time-costly and cumbersome procedure, often resulting in platform-dependent and static vocabularies. Hence lots of research has been focusing currently on more flexible and dynamic methods to develop them, as evidenced for example by the huge interest of folksonomies within the social media realm. We propose a new approach for constructing taxonomies. Our idea stems from the increased human involvement and desire to provide tags and annotate web content (e.g., in social media and product categorization applications). We define the required input from human users in the form of explicit structural information; that is, supertype-subtype relationships between concepts. Humans have a good understanding of such relationships. In this way, we harvest, via common annotation practices, the collective wisdom of users with respect to the (categorization of) web content they share and access. We further define the principles upon which crowdsourced taxonomy construction algorithms should be based. We show that the resulting problem is NP-Hard. We provide heuristic algorithms and relevant optimizations that aggregate human input, resolving conflicting input, and produce taxonomies. Our algorithm's evaluation is based on real-world crowdsourcing experiments (where real users provide such information) and on real-world taxonomies.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !