Member Login

E-mail:    Password:  


Vendor : SAS Institute


Email  E-mail this page

Related Content  Related Content

Remember  Remember this item

 

Format: PDF

Date: 15/02/2008


Two-Stage Variable Clustering for Large Data Sets

WORTHWHILE?

0

0 votes


Overview

In data mining, principal component analysis is a popular dimension reduction technique. It also provides a good remedy for the multicollinearity problem, but its interpretation of input space is not as good. To overcome the interpretation problem, principal components (cluster components) are obtained through variable clustering, which was implemented with PROC VARCLUS. The procedure uses oblique principal components analysis and binary iterative splits for variable clustering, and it provides non-orthogonal principal components. Even if this procedure sacrifices the orthogonal property among principal components, it provides good interpretable principal components and well-explained cluster structures of variables. However, the PROC VARCLUS implementation is inefficient to deal with high-dimensional data. This paper introduces the two-stage, variable clustering technique for large data sets.