This package augments Gretl's built-in pca
command.
The package is a collection of functions for conducting Principal Component Analysis. It ships two plotting functions for creating a so called 'scree plot' (https://en.wikipedia.org/wiki/Scree_plot) and a bi-plot (https://en.wikipedia.org/wiki/Biplot).
Furthermore, it supports the computation of sparse PCs meaning that some loading coefficients may be zero. The estimation of the loadings is done by the forward-stagewise boosting algorithm (Tibshirani, for details see the fsboost
Gretl package) which is similar to Lasso. Currently, however, only the loadings and scores are based on the sparse PC estimates but not the estimated variances.
Please report bugs or comments on the gretl mailing list, report an issue on github (https://github.com/atecon/pcaTools/issues) or write to atecon@posteo.de.
Download the gfn-file which you find here https://github.com/atecon/pcaTools/blob/main/src/pcaTools.gfn
To install the package, run the following Gretl command:
pkg install /path/to/pcaTools.gfn --local
pcaEst(const list X, bundle opts[null])
This function performs Principal Component Analysis (PCA) on the input data.
X
: list, Variables on which to conduct PCA.opts
: bundle, Optional bundle passing parameters
opts
can include the following parameter for setting options:
do_stdize
: bool, Centre variables and divide by their respective standard deviattion (default:TRUE
)use_vcv
: bool, Compute principal components based on variance-covariance matrix ifTRUE
, otherwise ifFALSE
use the correlation matrix (default:TRUE
)verbose
: bool, Make output more verbose ifTRUE
(default:FALSE
)
A bundle self
containing the results of the PCA.
pcaPrint(const bundle self)
This function prints the results of a PCA analysis.
self
: bundle, Returned information from thepcaEst()
function.
No return value. This function prints the PCA results to the console.
pcaScreeplot(const bundle self, const string filename[null])
This function generates a scree plot from the results of a PCA analysis. A scree plot is a line plot of the eigenvalues of factors or principal components in an analysis.
self
: A bundle containing the results of a PCA, typically returned bypcaEst
.filename
: A string referring to the PATH+FILENAME for storing the plot (optional). If no string is passed, the plot appears on the screen immediately.
One can tweak the plot by passing specific parameters to the bundle self
before calling pcaScreeplot()
. The following parameters are supported:
fontsize
: Size of font (default: 10)linedwidth
: Width of the line (default: 1.5)
No return value. This function creates a scree plot.
pcaBiplot(const bundle self, const string filename[null])
This function generates a bi-plot from the results of a PCA analysis for each combination of computed principal components as a matrix of plots. A biplot is a plot of two principal components. It shows the combination of scores and the loading factors for each principal component.
self
: A bundle containing the results of a PCA, typically returned bypcaEst
.filename
: A string referring to the PATH+FILENAME for storing the plot (optional). If no string is passed, the plot appears on the screen immediately.
No return value. This function generates a bi-plot.
One can tweak the plot by passing specific parameters to the bundle self
before calling pcaBiplot
. The following parameters are supported:
centre_biplot
: bool, Centre the axes ifTRUE
, otherwise not (default:TRUE
).cols_biplot
: int, Number of columns of gridplot (default:NA
-> automatically set)color_arrow
: string, Color of the arrows depicting the eigenvector (default: "web-blue")color_pattern
: string, Color pattern for (factorized) data points of biplot. Either "dark2" or "default" (default:dark2
).factor
: series, Distinct values for factorized bi-plot (default: none)fontsize
: int, Size of font for the title and variable names (default: 12)fontsize_arrow
: int, Size of font of the arrow labels (default: 12)fontsize_key
: int, Size of font of the key/ legend (default: 8)height_biplot
: int, Height of biplot (default: 600)linedwidth
: scalar, Width of the line (default: 1.5)linedwidth_arrow
: scalar, Width of the lines for the bi-plot arrows (default: 1.0)n_pcs_to_plot
: int, Number of first principal components to plot (default: all)offset_label_x
: scalar, Offset of labels for arrows along x-axis (default: 0)offset_label_y
: scalar, Offset of labels for arrows along y-axis (default: 0)pointtype
: int, Point type (default: 4)pointsize
: scalar, Size of point (default: 1.0)rows_biplot
: int, Number of rows of gridplot (default:NA
-> automatically set)sparse_pca
: bool, If true, compute sparse PCA, otherwise non-sparse version (default:FALSE
)transparency
: int, The rgbalpha plotting style assumes that each pixel of input data contains an alpha value in the range [0:255] (no transparency:full transparency). Currently, only applied to the 1st factor.width_biplot
: int, Width of biplot (default: 600)
-
v0.2 (February 2024)
- Introduce sparse regression-based PCA using the forward-stagewise boosting algorithm for feature selection
- Improve plotting the loadings: put on the secondary axis
- New package dependence: 'fsboost' package
- Make font size of the key adjustable
-
v0.1 (January 2024)
- Initial version