Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis
Description
To unravel the concept structure and dynamics of the bioinformatics field, we analyze a set of 7401 publications from the Web of Science and MEDLINE databases, publication years 1981–2004. For delineating this complex, interdisciplinary field, a novel bibliometric retrieval strategy is used. Given that the performance of unsupervised clustering and classification of scientific publications is significantly improved by deeply merging textual contents with the structure of the citation graph, we proceed with a hybrid clustering method based on Fisher’s inverse chi-square. The optimal number of clusters is determined by a compound semiautomatic strategy comprising a combination of  istancebased and stability-based methods. We also investigate the relationship between number of Latent Semantic Indexing factors, number of clusters, and clustering performance. The HITS and PageRank algorithms are used to determine representative publications in each cluster. Next, we develop a methodology for dynamic hybrid clustering of evolving bibliographic data sets. The same clustering methodology is applied to consecutive periods defined by time windows on the set, and in a subsequent phase chains are formed by matching and tracking clusters through time. Term networks for the eleven resulting cluster chains present the cognitive structure of the field. Finally, we provide a view on how much attention the bioinformatics community has devoted to the different subfields through time.
| Slides | |
| 0:03 | Dynamic Hybrid Clustering of Bioinformatics by Incorporating Text Mining and Citation Analysis |
| 0:21 | Overview of the presentation |
| 0:53 | General context |
| 1:42 | Agglomerative hierarchical clustering |
| 2:46 | Indexing in Vector Space Model |
| 3:37 | Bibliometrics and network analysis |
| 4:29 | Hybrid (integrated) clustering |
| 4:49 | Hybrid clustering: intermediate integration |
| 6:18 | Weighted linear combination (linco) |
| 7:20 | Fisher’s inverse chi-square method (1) |
| 7:42 | Fisher’s inverse chi-square method (2) |
| 8:53 | Fisher’s inverse chi-square method (3) |
| 9:26 | Conclusions from previous research |
| 10:21 | Dynamic hybrid mapping of bioinformatics |
| 10:53 | Number of clusters and LSI factors |
| 12:39 | Number of clusters: stability diagram |
| 13:02 | Number of clusters: link-based Silhouette values |
| 13:15 | Dendrogram |
| 13:41 | slide 19 |
| 15:21 | slide 20 |
| 16:10 | Dynamics |
| 17:11 | Dynamic term networks |
| 17:21 | Conclusions (1) |
| 18:05 | Conclusions (2) |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




