WIKImage: Correlated image and text datasets

author: Doni Pracner, Department of Mathematics and Informatics, University of Novi Sad
published: Nov. 4, 2011,   recorded: October 2011,   views: 3332


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


This paper presents work towards the creation of free and redistributable datasets of correlated images and text. Collections of free images and related text were extracted from Wikipedia with our new tool WIKImage. An additional tool – WIKImage browser – was introduced to visualize the resulting dataset, and was expanded into a manual labeling tool. The paper presents a starting dataset of 1007 images labeled with any combination of 14 tags. The images were processed into a number of scale invariant (SIFT) and color histogram features, and the captions were transformed into a bag-of-words (BOW) representation. Experiments were then performed with the aim of classifying data with respect to each of the labels on dataset variants with just the image information, just the textual data, and both, in order to estimate the difficulty of the dataset in the context of different feature spaces. Results indicate improvements in precision, recall and the F-measure when using the combined representation with support vector machines as well as the k-nearest neighbor classifier with the cosine similarity measure.

See Also:

Download slides icon Download slides: sikdd2011_pracner_wikimage_01.pdf (727.7 KB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: