Using linguistic information as features for text categorization
Description
We report on some experiences using linguistic information as additional features
in a classical Vector Space Model[10]. Extracted information of every word like
the Part Of Speech and stem, lexical root have been combined in different ways
for experimenting on a possible improvement in the classification performance
and on several algorithms, like SVM [3], BBR [] and PLAUM [6].
Automatic Text Classification, or Automatic Text Categorization as is also
known, tries to related documents to predefined set of classes. Extensive research
has been carried out on this subject [11] and a wide range of techniques are appliable
to solve this task: feature extraction [5], feature weighting, dimensionality
reduction [4], machine learning algorithms and more. Besides, the classification
task can be either binary (one out of two possible classes to select), multi-class
(one out of set of possible classes) or multi-label (a set of classes from a larger set
of potential candidates). In most cases, the latter two can be reduced to binary
decisions [1], as the used algorithm does in our experiments [8].
In order to verify the contribution of the new features, we have combined
them to be included into the vector space model by preprocessing the Reuters-
215781 collection, a well known set of data by the research community devoted
to text categorization problems [2].
| Slides | |
| 0:00 | Using linguistic information as features for text categorization (1) |
| 0:43 | Using linguistic information as features for text categorization (2) |
| 0:45 | Using linguistic information as features for text categorization (3) |
| 1:21 | Natural Language Processing (NLP): |
| 2:32 | Text categorization |
| 4:09 | Using linguistic information as features for text categorization |
| 5:28 | Text Categorization - Classification problems (1) |
| 6:31 | Text Categorization - Classification problems (2) |
| 7:04 | Text Categorization - Classification problems (3) |
| 7:19 | Text Categorization - Classification problems (4) |
| 7:55 | Text Categorization - Examples of TC systems (5) |
| 8:55 | Text Categorization - Examples of TC systems (6) |
| 9:42 | Text Categorization - Examples of TC systems (7) |
| 10:33 | Text Categorization - Examples of TC systems (8) |
| 11:24 | Text Categorization - Examples of TC systems (9) |
| 11:33 | Text Categorization - Examples of TC systems (10) |
| 11:52 | Text Categorization - Applications TC systems |
| 16:26 | Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (1) |
| 18:31 | Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (2) |
| 18:53 | Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (3) |
| 19:56 | Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (4) |
| 21:06 | Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (5) |
| 21:39 | Text Categorization at the European Laboratory for Particle Physics - An m-label TC problem (6) |
| 22:51 | Text Categorization - Architecture for a ML-based TC system () |
| 24:06 | Text Categorization - Architecture for a ML-based TC system (1) |
| 24:30 | Text Categorization - Architecture for a ML-based TC system (2) |
| 25:13 | Text Categorization - Architecture for a ML-based TC system (3) |
| 25:24 | Text Categorization - Architecture for a ML-based TC system (4) |
| 25:49 | Text Categorization - Architecture for a ML-based TC system (5) |
| 25:59 | Text Categorization - Architecture for a ML-based TC system (6) |
| 27:22 | Text Categorization - Architecture for a ML-based TC system (7) |
| 27:53 | Text Categorization - Architecture for a ML-based TC system (8) |
| 28:27 | Text Categorization - Evaluation issues (1) |
| 29:14 | Text Categorization - Evaluation issues (2) |
| 30:57 | Text Categorization - Evaluation issues (3) |
| 31:44 | Text Categorization - Evaluation issues (4) |
| 32:34 | Text Categorization - Evaluation issues (5) |
| 33:04 | Text Categorization - Tuning a TC system on the HEP corpus (1) |
| 33:57 | Text Categorization - Tuning a TC system on the HEP corpus (2) |
| 35:56 | Text Categorization - Tuning a TC system on the HEP corpus (3) |
| 36:21 | Text Categorization - Tuning a TC system on the HEP corpus (4) |
| 39:37 | Text Categorization - Tuning a TC system on the HEP corpus (5) |
| 41:01 | Text Categorization - Tuning a TC system on the HEP corpus (6) |
| 42:43 | Text Categorization - Tuning a TC system on the HEP corpus (7) |
| 43:20 | Text Categorization - Tuning a TC system on the HEP corpus (8) |
| 43:40 | Text Categorization - Some reflections so far... |
| 45:12 | Using linguistic information as features for text categorization |
| 45:13 | Features for Text Categorization - Bag of words |
| 45:19 | Features for Text Categorization - Linguistic Information |
| 46:09 | Features for Text Categorization - Past work (1) |
| 46:55 | Features for Text Categorization - Past work (2) |
| 47:24 | Features for Text Categorization - Why using highlevel information? (1) |
| 47:51 | Features for Text Categorization - Why using highlevel information? (2) |
| 48:23 | Features for Text Categorization - Why using highlevel information? (3) |
| 48:41 | Features for Text Categorization - Experiments setup |
| 50:23 | Features for Text Categorization - Results (1) |
| 51:11 | Features for Text Categorization - Results (2) |
| 51:17 | Features for Text Categorization - Results (3) |
| 51:29 | Features for Text Categorization - Results (4) |
| 51:41 | Features for Text Categorization - Results (5) |
| 52:17 | Features for Text Categorization - Results (7) |
| 52:46 | Features for Text Categorization - Conclusions |
| 53:38 | Features for Text Categorization - Future directions |
| 54:09 | Thank you very much for your attention! |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !


