Should I use PCA with categorical data
Can principal component analysis be applied to data sets that contain a mixture of continuous and categorical variables?
Although a PCA applied to binary data would produce results that are comparable to the results of a multiple correspondence analysis (factor values and eigenvalues are linearly linked), there are more suitable techniques for dealing with mixed data types, namely the multiple factor analysis for mixed data in the FactoMineR R -Package (). If your variables can be thought of as structured subsets of descriptive attributes, Multiple Factor Analysis () is also an option.
The challenge with categorical variables is to find a suitable way to represent distances between variable categories and individuals in the faculty space. To solve this problem, you can look for a nonlinear transform of any variable - be it nominal, ordinal, polynomial, or numeric - with optimal scaling. This is explained in detail in Gifi Methods for Optimal Scaling in R: The Packet Homals. An implementation is available in the corresponding R package homals.
A Google search "pca for discrete variables" gives this nice overview by S. Kolenikov (@StasK) and G. Angeles. To complement the answer, pc analysis is really an analysis of the eigenvectors of the covariance matrix. So the problem is how to compute the "correct" covariance matrix. One of the approaches is to use polychronic correlation.
I recommend a look at Linting & Kooij, 2012 "Nonlinear Principal Component Analysis with CATPCA: A Tutorial", Journal of Personality Assessment ; 94 (1).
Serving as a tutorial in Principal Nonlinear Components Analysis (NLPCA), this article systematically walks the reader through the process of analyzing actual data for personality assessment using the Rorschach Inkblot Test. NLPCA is a more flexible alternative to linear PCA that allows the analysis of potentially non-linearly related variables with different types of measurement levels. The method is particularly suitable for analyzing nominal (qualitative) and ordinal (e.g. Likert-type) data, possibly combined with numerical data. The CATPCA program from the Categories module in SPSS is used for the analyzes, but the method description can easily be transferred to other software packages.
I don't have permission to comment on a post yet, so I'm adding my comment as a separate reply. Please contact me.
After continuing @Martin F's comment, I recently came across the nonlinear PCAs. I have explored nonlinear PCAs as a possible alternative when a continuous variable approaches the distribution of an ordinal variable, when the data becomes sparse (it happens a lot in genetics when the co-parallel frequencies of the variables keep decreasing and you get stuck) with a very low number of counts where you can't really justify a distribution of a continuous variable, and you need to relax the distributional assumptions by making it either an ordering variable or a categorical variable that the nonlinear PCAs are not used often and the behavior of these PCAs has not yet been extensively tested (they may only have been related to the genetic field, so please take it with a grain of salt) Indeed, it's a fascinating option. Hope I added 2 cents (luckily relevant) to the discussion.
There is a recently developed approach to such problems: generalized, low-ranked models.
Work using this technique is even referred to as PCA on a data frame.
PCA can be set like this:
For x matrixm MnmM
find x matrix and x matrix (this implicitly encodes rank e constraint) such that k X k m Y knkX ^ kmY ^ k
X ^, Y ^ = .argminX, Y∥M − XY∥2F
The 'generalized' of GLRM stands for change to something else and adding a regularization term. 2F
# Rstats package:
Implements principal component analysis, orthogonal rotation, and multiple factor analysis for a mixture of quantitative and qualitative variables.
Example from Vignette shows results for continuous and categorical output
- What is the full form of OG
- How do I treat people like objects
- What is apparent weight
- What are wave packets in quantum mechanics
- Has Google stopped hiring in the foo bar
- Who is the founder of GPS
- Why was Napoleon important
- Will announce Sony PS5 in 2018
- What is the formula of organic acid
- How many harps does a piano have
- Why can't software be updated without restarting?
- Chrome notebooks will be available internationally
- How is life in Auroville
- What is the scope of the C language
- How can creating lists benefit your business?
- What is Sakshi Tanwar known for?
- What is the life story of aryabhatta
- Why is it itchy between my legs
- How are futurism and dadaism similar
- How can I inject marijuana
- Which countries manufacture ICBM missiles?
- What are the functions of veins in leaves
- What's the latest Google SEO update
- Is it legal to swap votes