EN

论文

当前位置: 首页 > 科学研究 > 科研成果 > 论文 > 正文

Unsupervised feature selection through Gram-Schmidt orthogonalization-A word co-occurrence perspective

来源: | 发布时间:2016-10-12| 点击:
Unsupervised feature selection through Gram-Schmidt orthogonalization-A word co-occurrence perspective

作者:Wang, DQ (Wang, Deqing)[ 1 ] ; Zhang, H (Zhang, Hui)[ 1 ] ; Liu, R (Liu, Rui)[ 1 ] ; Liu, XL (Liu, Xianglong)[ 1 ] ; Wang, J (Wang, Jing)[ 2 ] 

NEUROCOMPUTING  

卷: 173  

页: 845-854  

DOI: 10.1016/j.neucom.2015.08.038  

出版年: JAN 15 2016  

摘要

Feature selection is a key step in many machine learning applications, such as categorization, and clustering. Especially for text data, the original document-term matrix is high-dimensional and sparse, which affects the performance of feature selection algorithms. Meanwhile, labeling training instance is time-consuming and expensive. So unsupervised feature selection algorithms have attracted more attention. In this paper, we propose an unsupervised feature selection algorithm through R andom P rojection and G ram-G chmidt O rthogonalization (RP-GSO) from the word co-occurrence matrix. The RP-GSO algorithm has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix; (2) it selects "basis features" by Gram-Schmidt process, guaranteeing the orthogonalization of feature space; and (3) it adopts random projection to speed up GS process. Extensive experimental results show our proposed RP-GSO approach achieves better performance comparing against supervised and unsupervised feature selection methods in text classification and clustering tasks. (C) 2015 Elsevier B.V. All rights reserved.