Predictive methods for Text mining
author:
Tong Zhang,
Yahoo! Research, Yahoo!
Description
I will give a general overview of using prediction methods in text mining applications, including text categorization, information extraction, summarization, and question answering. I will then discuss some of the more advanced issues encountered in real applications such as structured and complicated output classification, the use of unlabeled data, modeling link structures, collective inference and community effect, and transfer learning under changing environment, etc.
You might be experiencing some problems with Your Video player.
| Slides | |
| 0:05 | Predictive Methods for Text Mining |
| 1:25 | Motivation for Text Mining |
| 3:01 | Structured Data-mining |
| 4:00 | Structured Data Example |
| 4:29 | Unstructured Text-mining |
| 6:25 | Some Problems in Predictive Text-mining |
| 9:21 | The Machine Learning Approach |
| 10:17 | Supervised learning |
| 11:20 | Outline of the Tutorial |
| 12:27 | Electronic Text |
| 13:23 | An Example of XML Document |
| 13:52 | Text Processing for Predictive Modeling |
| 14:58 | Tokenization |
| 16:17 | Issues in Tokenization |
| 20:22 | Simple English Tokenization Procedure |
| 22:11 | Lemmatization and Stemming |
| 25:46 | Document Level Feature Representation |
| 29:45 | Vector Space Document Model |
| 38:49 | Removal of Stopwords |
| 41:29 | Term Weighting |
| 45:05 | Term Weighting in Document Retrieval |
| 48:08 | Token Statistics: Zipf’s Law |
| 50:55 | Summary of Document Level Feature Generation |
| 53:09 | Example Feature Vector for Email Spam Detection |
| 55:17 | Text Categorization |
| 56:28 | Text Categorization Applications |
| 58:20 | Electronic Spam Detection |
| 61:10 | Taxonomy Classification |
| 62:57 | Basic Text Categorization Framework |
| 65:38 | Probability Calibration |
| 67:18 | Comments on Probability Calibration |
| 69:34 | Common Classification Methods |
| 69:58 | Document Similarity in Vector Space Model |
| 71:50 | Nearest Neighbor Method |
| 72:57 | Centroid Method |
| 76:56 | Example Feature Vector for Email Spam Detection |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Visitors who watched this lecture also watched...
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




