Unsupervised feature selection through Gram-Schmidt orthogonalization-A word co-occurrence perspective
作者:Wang, DQ (Wang, Deqing)[ 1 ] ; Zhang, H (Zhang, Hui)[ 1 ] ; Liu, R (Liu, Rui)[ 1 ] ; Liu, XL (Liu, Xianglong)[ 1 ] ; Wang, J (Wang, Jing)[ 2 ]
NEUROCOMPUTING
DOI:
10.1016/j.neucom.2015.08.038
出版年:
JAN 15 2016
摘要
Feature selection is a key step in many machine learning applications, such as categorization, and clustering. Especially for text data, the original document-term matrix is high-dimensional and sparse, which affects the performance of feature selection algorithms. Meanwhile, labeling training instance is time-consuming and expensive. So unsupervised feature selection algorithms have attracted more attention. In this paper, we propose an unsupervised feature selection algorithm through R andom P rojection and G ram-G chmidt O rthogonalization (RP-GSO) from the word co-occurrence matrix. The RP-GSO algorithm has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix; (2) it selects "basis features" by Gram-Schmidt process, guaranteeing the orthogonalization of feature space; and (3) it adopts random projection to speed up GS process. Extensive experimental results show our proposed RP-GSO approach achieves better performance comparing against supervised and unsupervised feature selection methods in text classification and clustering tasks. (C) 2015 Elsevier B.V. All rights reserved.