Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI)
published: Feb. 25, 2007, recorded: November 2003, views: 1605
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Information retrieval in the vector space model is based on literal matching of terms in the documents and the queries. The model is implemented by creating the term-document matrix, which is formed on the base of frequencies of terms in documents. Literal matching of terms does not necessarily retrieve all relevant documents. Synonymy (multiple words having the same meaning) and polysemy (words having multiple meaning) are two major obstacles for efficient information retrieval. Latent semantic indexing (LSI) and concept indexing (CI) are information retrieval techniques embedded in the vector space model, which address the problem of synonymy and polysemy. The method of LSI is an information retrieval technique using a low-rank singular value decomposition (SVD) of the term-document matrix. Although the LSI method has empirical success, it suffers from the lack of interpretation for the low-rank approximation and, consequently, the lack of controls for accomplishing specific tasks in information retrieval. The method of CI uses centroids of clusters or so-called concept decomposition (CD) for lowering the rank of the term-document matrix. Here we compare SVD/LSI and CD/CI in terms of matrix approximations and precision of information retrieval.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !