event thumbnail image
NATO Advanced Study Institute on Mining Massive Data Sets for Security

Using linguistic information as features for text categorization

author: Arturo Montejo Ráez, University of Jaén

Description

We report on some experiences using linguistic information as additional features in a classical Vector Space Model[10]. Extracted information of every word like the Part Of Speech and stem, lexical root have been combined in different ways for experimenting on a possible improvement in the classification performance and on several algorithms, like SVM [3], BBR [] and PLAUM [6]. Automatic Text Classification, or Automatic Text Categorization as is also known, tries to related documents to predefined set of classes. Extensive research has been carried out on this subject [11] and a wide range of techniques are appliable to solve this task: feature extraction [5], feature weighting, dimensionality reduction [4], machine learning algorithms and more. Besides, the classification task can be either binary (one out of two possible classes to select), multi-class (one out of set of possible classes) or multi-label (a set of classes from a larger set of potential candidates). In most cases, the latter two can be reduced to binary decisions [1], as the used algorithm does in our experiments [8]. In order to verify the contribution of the new features, we have combined them to be included into the vector space model by preprocessing the Reuters- 215781 collection, a well known set of data by the research community devoted to text categorization problems [2].

You might be experiencing some problems with Your Video player.
Slides
0:00 Using linguistic information as features for text categorization (1)
0:43 Using linguistic information as features for text categorization (2)
0:45 Using linguistic information as features for text categorization (3)
1:21 Natural Language Processing (NLP):
2:32 Text categorization
4:09 Using linguistic information as features for text categorization
5:28 Text Categorization - Classification problems (1)
6:31 Text Categorization - Classification problems (2)
7:04 Text Categorization - Classification problems (3)
7:19 Text Categorization - Classification problems (4)
7:55 Text Categorization - Examples of TC systems (5)
8:55 Text Categorization - Examples of TC systems (6)
9:42 Text Categorization - Examples of TC systems (7)
10:33 Text Categorization - Examples of TC systems (8)
11:24 Text Categorization - Examples of TC systems (9)
11:33 Text Categorization - Examples of TC systems (10)
11:52 Text Categorization - Applications TC systems
16:26 Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (1)
18:31 Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (2)
18:53 Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (3)
19:56 Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (4)
21:06 Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (5)
21:39 Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (6)
22:51 Text Categorization - Architecture for a ML-based TC system ()
24:06 Text Categorization - Architecture for a ML-based TC system (1)
24:30 Text Categorization - Architecture for a ML-based TC system (2)
25:13 Text Categorization - Architecture for a ML-based TC system (3)
25:24 Text Categorization - Architecture for a ML-based TC system (4)
25:49 Text Categorization - Architecture for a ML-based TC system (5)
25:59 Text Categorization - Architecture for a ML-based TC system (6)
27:22 Text Categorization - Architecture for a ML-based TC system (7)
27:53 Text Categorization - Architecture for a ML-based TC system (8)
28:27 Text Categorization - Evaluation issues (1)
29:14 Text Categorization - Evaluation issues (2)
30:57 Text Categorization - Evaluation issues (3)
31:44 Text Categorization - Evaluation issues (4)
32:34 Text Categorization - Evaluation issues (5)
33:04 Text Categorization - Tuning a TC system on the HEP corpus (1)
33:57 Text Categorization - Tuning a TC system on the HEP corpus (2)
35:56 Text Categorization - Tuning a TC system on the HEP corpus (3)
36:21 Text Categorization - Tuning a TC system on the HEP corpus (4)
39:37 Text Categorization - Tuning a TC system on the HEP corpus (5)
41:01 Text Categorization - Tuning a TC system on the HEP corpus (6)
42:43 Text Categorization - Tuning a TC system on the HEP corpus (7)
43:20 Text Categorization - Tuning a TC system on the HEP corpus (8)
43:40 Text Categorization - Some reflections so far...
45:12 Using linguistic information as features for text categorization
45:13 Features for Text Categorization - Bag of words
45:19 Features for Text Categorization - Linguistic Information
46:09 Features for Text Categorization - Past work (1)
46:55 Features for Text Categorization - Past work (2)
47:24 Features for Text Categorization - Why using highlevel information? (1)
47:51 Features for Text Categorization - Why using highlevel information? (2)
48:23 Features for Text Categorization - Why using highlevel information? (3)
48:41 Features for Text Categorization - Experiments setup
50:23 Features for Text Categorization - Results (1)
51:11 Features for Text Categorization - Results (2)
51:17 Features for Text Categorization - Results (3)
51:29 Features for Text Categorization - Results (4)
51:41 Features for Text Categorization - Results (5)
52:17 Features for Text Categorization - Results (7)
52:46 Features for Text Categorization - Conclusions
53:38 Features for Text Categorization - Future directions
54:09 Thank you very much for your attention!

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: