From 26ce7747cf7933d0fd87a7064ebcd62ec4a95ed9 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 20:43:06 +0800 Subject: [PATCH 01/11] Finalize the vignettes --- vignettes/betaselectr_glm.Rmd | 64 ++++++++++++----------- vignettes/betaselectr_glm.Rmd.original | 62 ++++++++++++----------- vignettes/betaselectr_lav.Rmd | 70 +++++++++++++------------- vignettes/betaselectr_lav.Rmd.original | 66 ++++++++++++------------ vignettes/betaselectr_lm.Rmd | 62 ++++++++++++----------- vignettes/betaselectr_lm.Rmd.original | 60 +++++++++++----------- vignettes/references.bib | 57 +++++++++++++++++++++ 7 files changed, 258 insertions(+), 183 deletions(-) diff --git a/vignettes/betaselectr_glm.Rmd b/vignettes/betaselectr_glm.Rmd index aaca503..7e6f7df 100644 --- a/vignettes/betaselectr_glm.Rmd +++ b/vignettes/betaselectr_glm.Rmd @@ -1,6 +1,6 @@ --- title: "Beta-Select Demonstration: Logistic Regression by `glm()`" -date: "2024-10-30" +date: "2024-10-31" output: rmarkdown::html_vignette: number_sections: true @@ -20,11 +20,13 @@ csl: apa.csl This article demonstrates how to use `glm_betaselect()` from the package -`betaselectr` +[`betaselectr`](https://sfcheung.github.io/betaselectr/) to standardize selected variables in a model fitted by `glm()` and forming confidence intervals for the parameters. +Logistic regression is used in this +illustration. # Data and Model @@ -45,7 +47,7 @@ head(data_test_mod_cat_binary) #> 6 0 13.14 48.65 21.03 gp3 ``` -This is the regression model, fitted by +This is the logistic regression model, fitted by `glm()`: @@ -95,7 +97,8 @@ summary(glm_out) # Problems With Standardization In logistic regression, there are several -ways to do standardization. We use the +ways to do standardization [@menard_six_2004]. +We use the same approach in linear regression and standardize all variables, except for the binary response variable. @@ -175,20 +178,22 @@ However, for this model, there are several problems: - The product term, `iv:mod`, is also - standardized (`iv_x_mod` in this model). + standardized (`iv_x_mod`, computed + using the standard deviations of + `dv` and `iv:mod`). This is inappropriate. One simple but underused solution is standardizing the variables *before* forming the product term [see @friedrich_defense_1982 on the case of linear regression]. -- The confidence intervals are formed using +- The default confidence intervals are formed using profiling in `glm()`. It does allow for asymmetry. However, it does not take into account the sampling variation of the standardizers (the sample standard deviations used in standardization). - It is not clear whether it will be + It is unclear whether it will be biased, as in the case of OLS standard error [@yuan_biases_2011]. @@ -222,7 +227,7 @@ to solve these problems by: into account selected standardization. We call the coefficients computed by -this kind of standardization *beta*s-Select +this kind of standardization *beta*s-select ($\beta{s}_{Select}$, $\beta_{Select}$ in singular form), to differentiate them from coefficients @@ -272,7 +277,7 @@ string variables) will not be standardized. Bootstrapping is done by default. In this illustration, `do_boot = FALSE` is added -to disabled it because we only want to +to disable it because we only want to address the first problem. We will do bootstrapping when addressing the issue with confidence intervals. @@ -346,7 +351,7 @@ term standardized, the coefficient of to the case of linear regression [@cheung_improving_2022], the coefficient of *standardized* product term (`iv:mod`) -can be very different from the +can be substantially different from the properly standardized product term (the product of standardized `iv` and standardized `mod`). @@ -355,10 +360,14 @@ standardized `mod`). Suppose we want to address both the first and the second problems, -with the product -term computed after `iv` and `mod` are -standardized and bootstrap confidence -interval used, we can call `glm_betaselect()` +with + +- the product term computed after `iv` and `mod` are +standardized, and + +- bootstrap confidence interval used. + +We can call `glm_betaselect()` again, with additional arguments set: @@ -445,7 +454,7 @@ By default, 95% percentile bootstrap confidence intervals are printed (`CI.Lower` and `CI.Upper`). The *p*-values (`Pr(Boot)`) are asymmetric bootstrap -*p*-values. +*p*-values [@asparouhov_bootstrap_2021]. ## Estimates and Bootstrap Confidence Intervals, With Only Selected Variables Standardized @@ -456,13 +465,13 @@ done using either `to_standardize` or `not_to_standardize`. - Use `to_standardize` when -the variables to standardize -is much fewer than the variables -not to standardize. +the number of variables to standardize +is much fewer than number of the variables +not to standardize - Use `not_to_standardize` -when the variables to standardize -is much more than the +when the number of variables to standardize +is much more than the number of variables not to standardize. For example, suppose we only @@ -484,15 +493,16 @@ glm_beta_select_boot_1 <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, ``` If we want to standardize all -variables except for `dv`, `mod`, we can use +variables except for `mod` (`dv` +is skipped by `skip_response`) we can use this call, and set -`not_to_standardize` to `c("dv", "mod")`: +`not_to_standardize` to `"mod"`: ``` r glm_beta_select_boot_2 <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data = data_test_mod_cat_binary, - not_to_standardize = c("dv", "mod"), + not_to_standardize = c("mod"), skip_response = TRUE, family = binomial(), bootstrap = 5000, @@ -569,11 +579,7 @@ This can be done in table notes. When calling `glm_betaselect()`, categorical variables (factors and -string variables) will not be standardized -by default. -This can be overriden by setting -`skip_categorical_x` to `FALSE`, though -not recommended. +string variables) will never be standardized. In the example above, the coefficients of the two dummy variables when both @@ -599,7 +605,7 @@ printCoefmat(glm_std_common_summary$coefficients[5:6, ], These two values are not interpretable because it does not make sense to talk -about a one-SD change in the dummy variables. +about a "one-SD change" in the dummy variables. # Conclusion diff --git a/vignettes/betaselectr_glm.Rmd.original b/vignettes/betaselectr_glm.Rmd.original index 0900d47..cfe64a0 100644 --- a/vignettes/betaselectr_glm.Rmd.original +++ b/vignettes/betaselectr_glm.Rmd.original @@ -30,11 +30,13 @@ format_str <- function(x, digits = 3) { This article demonstrates how to use `glm_betaselect()` from the package -`betaselectr` +[`betaselectr`](https://sfcheung.github.io/betaselectr/) to standardize selected variables in a model fitted by `glm()` and forming confidence intervals for the parameters. +Logistic regression is used in this +illustration. # Data and Model @@ -47,7 +49,7 @@ library(betaselectr) head(data_test_mod_cat_binary) ``` -This is the regression model, fitted by +This is the logistic regression model, fitted by `glm()`: ```{r} @@ -71,7 +73,8 @@ summary(glm_out) # Problems With Standardization In logistic regression, there are several -ways to do standardization. We use the +ways to do standardization [@menard_six_2004]. +We use the same approach in linear regression and standardize all variables, except for the binary response variable. @@ -123,20 +126,22 @@ However, for this model, there are several problems: - The product term, `iv:mod`, is also - standardized (`iv_x_mod` in this model). + standardized (`iv_x_mod`, computed + using the standard deviations of + `dv` and `iv:mod`). This is inappropriate. One simple but underused solution is standardizing the variables *before* forming the product term [see @friedrich_defense_1982 on the case of linear regression]. -- The confidence intervals are formed using +- The default confidence intervals are formed using profiling in `glm()`. It does allow for asymmetry. However, it does not take into account the sampling variation of the standardizers (the sample standard deviations used in standardization). - It is not clear whether it will be + It is unclear whether it will be biased, as in the case of OLS standard error [@yuan_biases_2011]. @@ -170,7 +175,7 @@ to solve these problems by: into account selected standardization. We call the coefficients computed by -this kind of standardization *beta*s-Select +this kind of standardization *beta*s-select ($\beta{s}_{Select}$, $\beta_{Select}$ in singular form), to differentiate them from coefficients @@ -219,7 +224,7 @@ string variables) will not be standardized. Bootstrapping is done by default. In this illustration, `do_boot = FALSE` is added -to disabled it because we only want to +to disable it because we only want to address the first problem. We will do bootstrapping when addressing the issue with confidence intervals. @@ -243,7 +248,7 @@ term standardized, the coefficient of to the case of linear regression [@cheung_improving_2022], the coefficient of *standardized* product term (`iv:mod`) -can be very different from the +can be substantially different from the properly standardized product term (the product of standardized `iv` and standardized `mod`). @@ -252,10 +257,14 @@ standardized `mod`). Suppose we want to address both the first and the second problems, -with the product -term computed after `iv` and `mod` are -standardized and bootstrap confidence -interval used, we can call `glm_betaselect()` +with + +- the product term computed after `iv` and `mod` are +standardized, and + +- bootstrap confidence interval used. + +We can call `glm_betaselect()` again, with additional arguments set: @@ -289,7 +298,7 @@ By default, 95% percentile bootstrap confidence intervals are printed (`CI.Lower` and `CI.Upper`). The *p*-values (`Pr(Boot)`) are asymmetric bootstrap -*p*-values. +*p*-values [@asparouhov_bootstrap_2021]. ## Estimates and Bootstrap Confidence Intervals, With Only Selected Variables Standardized @@ -300,13 +309,13 @@ done using either `to_standardize` or `not_to_standardize`. - Use `to_standardize` when -the variables to standardize -is much fewer than the variables -not to standardize. +the number of variables to standardize +is much fewer than number of the variables +not to standardize - Use `not_to_standardize` -when the variables to standardize -is much more than the +when the number of variables to standardize +is much more than the number of variables not to standardize. For example, suppose we only @@ -327,14 +336,15 @@ glm_beta_select_boot_1 <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, ``` If we want to standardize all -variables except for `dv`, `mod`, we can use +variables except for `mod` (`dv` +is skipped by `skip_response`) we can use this call, and set -`not_to_standardize` to `c("dv", "mod")`: +`not_to_standardize` to `"mod"`: ```{r, results = FALSE} glm_beta_select_boot_2 <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data = data_test_mod_cat_binary, - not_to_standardize = c("dv", "mod"), + not_to_standardize = c("mod"), skip_response = TRUE, family = binomial(), bootstrap = 5000, @@ -358,11 +368,7 @@ This can be done in table notes. When calling `glm_betaselect()`, categorical variables (factors and -string variables) will not be standardized -by default. -This can be overriden by setting -`skip_categorical_x` to `FALSE`, though -not recommended. +string variables) will never be standardized. In the example above, the coefficients of the two dummy variables when both @@ -382,7 +388,7 @@ printCoefmat(glm_std_common_summary$coefficients[5:6, ], These two values are not interpretable because it does not make sense to talk -about a one-SD change in the dummy variables. +about a "one-SD change" in the dummy variables. # Conclusion diff --git a/vignettes/betaselectr_lav.Rmd b/vignettes/betaselectr_lav.Rmd index 6603565..f723cae 100644 --- a/vignettes/betaselectr_lav.Rmd +++ b/vignettes/betaselectr_lav.Rmd @@ -1,6 +1,6 @@ --- title: "Beta-Select Demonstration: SEM by 'lavaan'" -date: "2024-10-06" +date: "2024-10-31" output: rmarkdown::html_vignette: number_sections: true @@ -8,6 +8,8 @@ vignette: > %\VignetteIndexEntry{Beta-Select Demonstration: SEM by 'lavaan'} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} +bibliography: "references.bib" +csl: apa.csl --- @@ -18,7 +20,7 @@ vignette: > This article demonstrates how to use `lav_betaselect()` from the package -`betaselectr` +[`betaselectr`](https://sfcheung.github.io/betaselectr/) to standardize selected variables in a model fitted by `lavaan` and forming confidence @@ -27,7 +29,7 @@ intervals for the parameters. # Data and Model The sample dataset from the package -`betaselectr` will be used for in this +`betaselectr` will be used in this demonstration: @@ -49,6 +51,8 @@ This is the path model, fitted by ``` r library(lavaan) +#> This is lavaan 0.6-19 +#> lavaan is FREE software! Please report any bugs. mod <- " med ~ iv + mod + iv:mod + cov1 + cov2 @@ -144,14 +148,14 @@ problems: standardized. This is inappropriate. One simple but underused solution is standardized the variables *before* - forming the product term (Friedrich, 1982). + forming the product term [@friedrich_defense_1982]. - The confidence intervals are formed using the delta-method, which has been found to be inferior to methods such as nonparametric percentile bootstrap confidence interval for the standardized - solution (Falk, 2018). Although there + solution [@falk_are_2018]. Although there are situations in which the delta-method confidence and the nonparametric percentile bootstrap confidences can be @@ -165,7 +169,7 @@ problems: do not need to be standardized. for example, if `cov1` is age measured by year, then age is more - meaningful than the "standardized age". + meaningful than "standardized age". - In path analysis, categorical variables are usually represented by dummy variables, @@ -178,18 +182,18 @@ problems: The function `lav_betaselect()` can be used to solve this problem by: -- Standardizing variables before product - terms are formed. +- standardizing variables before product + terms are formed, -- Standardizing only variables for which +- standardizing only variables for which standardization can facilitate - interpretation. + interpretation, and -- Forming confidence intervals that take +- forming confidence intervals that take into account selected standardization. We call the coefficients computed by -this kind of standardization *beta*s-Select +this kind of standardization *beta*s-select ($\beta{s}_{Select}$, $\beta_{Select}$ in singular form), to differentiate them from coefficients @@ -261,9 +265,9 @@ term standardized, the coefficient of `iv:mod` changed substantially from 3.588 to 0.286. As shown by -Cheung et al. (2022), the coefficient +@cheung_improving_2022, the coefficient of *standardized* product term (`iv:mod`) -can be severely biased estimate of the +can be substantially different from the properly standardized product term (the product of standardized `iv` and standardized `mod`). @@ -277,10 +281,16 @@ are formed *after* standardization. Suppose we want to address both the first and the second problems, -with the product -term computed after `iv` and `mod` -standardized and bootstrap confidence -interval used, we can call `lav_betaselect()` +with + +- the product term computed after `iv` and `mod` + standardized, and + +- bootstrap confidence intervals used, that + take into account the sampling variation + of the standardizers (the standard deviations). + +We can call `lav_betaselect()` again, with additional arguments set: @@ -396,14 +406,14 @@ done using either `to_standardize` or `not_to_standardize`. - Use `to_standardize` when -the variables to standardize -is much fewer than the variables +the number of variables to standardize +is much fewer than the number of variables not to standardize. - Use `not_to_standardize` -when the variables to standardize +when the number variables to standardize is much more than the -variables not to standardize. +the number of variables not to standardize. For example, suppose we only need to standardize `dv` and @@ -441,7 +451,7 @@ fit_beta_select_2 <- lav_betaselect(fit, ``` The results of these calls are identical, -and only those of the first version are +and only those of the second version are printed: @@ -491,14 +501,14 @@ fit_beta_select_2 #> them. The product term(s) is/are not standardized. ``` -The footnotes confirmed that, by +The footnotes show that, by specifying that `dv` and `mod` are not standardized, all the other four variables are standardized: `iv`, `med`, `cov1`, and `cov2`. Therefore, in this case, it is more convenient to use `not_to_standardize`. -For *beta*s-*select*, researchers need +When reporting *beta*s-*select*, researchers need to state which variables are standardized and which are not. This can be done in table notes, @@ -567,7 +577,7 @@ before forming product terms. We are not aware of tools that can do appropriate standardization *and* form confidence intervals that takes into account the -selective Standardization. By promoting +selective standardization. By promoting the use of *beta*s-*select* using `lav_betaselect()`, we hope to make it easier for researchers to do appropriate @@ -575,11 +585,3 @@ Standardization in when reporting SEM results. # References - -Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022). Improving an old way to measure moderation effect in standardized units. *Health Psychology, 41*(7), 502--505. https://doi.org/10.1037/hea0001188 - -Falk, C. F. (2018). Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation modeling? *Structural Equation Modeling: A Multidisciplinary Journal, 25*(2), 244--266. https://doi.org/10.1080/10705511.2017.1367254 - -Friedrich, R. J. (1982). In defense of multiplicative terms in multiple regression equations. *American Journal of Political Science, 26*(4), 797--833. https://doi.org/10.2307/2110973 - - diff --git a/vignettes/betaselectr_lav.Rmd.original b/vignettes/betaselectr_lav.Rmd.original index f72af37..eda0075 100644 --- a/vignettes/betaselectr_lav.Rmd.original +++ b/vignettes/betaselectr_lav.Rmd.original @@ -8,6 +8,8 @@ vignette: > %\VignetteIndexEntry{Beta-Select Demonstration: SEM by 'lavaan'} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} +bibliography: "references.bib" +csl: apa.csl --- ```{r, include = FALSE} @@ -28,7 +30,7 @@ format_str <- function(x, digits = 3) { This article demonstrates how to use `lav_betaselect()` from the package -`betaselectr` +[`betaselectr`](https://sfcheung.github.io/betaselectr/) to standardize selected variables in a model fitted by `lavaan` and forming confidence @@ -37,7 +39,7 @@ intervals for the parameters. # Data and Model The sample dataset from the package -`betaselectr` will be used for in this +`betaselectr` will be used in this demonstration: ```{r} @@ -86,14 +88,14 @@ problems: standardized. This is inappropriate. One simple but underused solution is standardized the variables *before* - forming the product term (Friedrich, 1982). + forming the product term [@friedrich_defense_1982]. - The confidence intervals are formed using the delta-method, which has been found to be inferior to methods such as nonparametric percentile bootstrap confidence interval for the standardized - solution (Falk, 2018). Although there + solution [@falk_are_2018]. Although there are situations in which the delta-method confidence and the nonparametric percentile bootstrap confidences can be @@ -107,7 +109,7 @@ problems: do not need to be standardized. for example, if `cov1` is age measured by year, then age is more - meaningful than the "standardized age". + meaningful than "standardized age". - In path analysis, categorical variables are usually represented by dummy variables, @@ -120,18 +122,18 @@ problems: The function `lav_betaselect()` can be used to solve this problem by: -- Standardizing variables before product - terms are formed. +- standardizing variables before product + terms are formed, -- Standardizing only variables for which +- standardizing only variables for which standardization can facilitate - interpretation. + interpretation, and -- Forming confidence intervals that take +- forming confidence intervals that take into account selected standardization. We call the coefficients computed by -this kind of standardization *beta*s-Select +this kind of standardization *beta*s-select ($\beta{s}_{Select}$, $\beta_{Select}$ in singular form), to differentiate them from coefficients @@ -175,9 +177,9 @@ term standardized, the coefficient of `iv:mod` changed substantially from `r format_str(b_std)` to `r format_str(b_select)`. As shown by -Cheung et al. (2022), the coefficient +@cheung_improving_2022, the coefficient of *standardized* product term (`iv:mod`) -can be severely biased estimate of the +can be substantially different from the properly standardized product term (the product of standardized `iv` and standardized `mod`). @@ -191,10 +193,16 @@ are formed *after* standardization. Suppose we want to address both the first and the second problems, -with the product -term computed after `iv` and `mod` -standardized and bootstrap confidence -interval used, we can call `lav_betaselect()` +with + +- the product term computed after `iv` and `mod` + standardized, and + +- bootstrap confidence intervals used, that + take into account the sampling variation + of the standardizers (the standard deviations). + +We can call `lav_betaselect()` again, with additional arguments set: @@ -267,14 +275,14 @@ done using either `to_standardize` or `not_to_standardize`. - Use `to_standardize` when -the variables to standardize -is much fewer than the variables +the number of variables to standardize +is much fewer than the number of variables not to standardize. - Use `not_to_standardize` -when the variables to standardize +when the number variables to standardize is much more than the -variables not to standardize. +the number of variables not to standardize. For example, suppose we only need to standardize `dv` and @@ -310,7 +318,7 @@ fit_beta_select_2 <- lav_betaselect(fit, ``` The results of these calls are identical, -and only those of the first version are +and only those of the second version are printed: ```{r, eval = FALSE} @@ -322,14 +330,14 @@ tmp <- capture.output(print(fit_beta_select_2)) cat(tmp[c(2:27, 55:66)], sep = "\n") ``` -The footnotes confirmed that, by +The footnotes show that, by specifying that `dv` and `mod` are not standardized, all the other four variables are standardized: `iv`, `med`, `cov1`, and `cov2`. Therefore, in this case, it is more convenient to use `not_to_standardize`. -For *beta*s-*select*, researchers need +When reporting *beta*s-*select*, researchers need to state which variables are standardized and which are not. This can be done in table notes, @@ -370,7 +378,7 @@ before forming product terms. We are not aware of tools that can do appropriate standardization *and* form confidence intervals that takes into account the -selective Standardization. By promoting +selective standardization. By promoting the use of *beta*s-*select* using `lav_betaselect()`, we hope to make it easier for researchers to do appropriate @@ -378,11 +386,3 @@ Standardization in when reporting SEM results. # References - -Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022). Improving an old way to measure moderation effect in standardized units. *Health Psychology, 41*(7), 502--505. https://doi.org/10.1037/hea0001188 - -Falk, C. F. (2018). Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation modeling? *Structural Equation Modeling: A Multidisciplinary Journal, 25*(2), 244--266. https://doi.org/10.1080/10705511.2017.1367254 - -Friedrich, R. J. (1982). In defense of multiplicative terms in multiple regression equations. *American Journal of Political Science, 26*(4), 797--833. https://doi.org/10.2307/2110973 - - diff --git a/vignettes/betaselectr_lm.Rmd b/vignettes/betaselectr_lm.Rmd index e656028..97255b0 100644 --- a/vignettes/betaselectr_lm.Rmd +++ b/vignettes/betaselectr_lm.Rmd @@ -1,6 +1,6 @@ --- title: "Beta-Select Demonstration: Regression by `lm()`" -date: "2024-10-30" +date: "2024-10-31" output: rmarkdown::html_vignette: number_sections: true @@ -20,7 +20,7 @@ csl: apa.csl This article demonstrates how to use `lm_betaselect()` from the package -`betaselectr` +[`betaselectr`](https://sfcheung.github.io/betaselectr/) to standardize selected variables in a model fitted by `lm()` and forming confidence @@ -176,14 +176,16 @@ printCoefmat(lm_std_common_summary$coefficients, However, for this model, there are several problems: -- The product term, `iv:mod`, is also - standardized. This is inappropriate. +- The product term is also + standardized (`iv_x_mod`, computed + using the standard deviations of + `dv` and `iv:mod`). This is inappropriate [@hayes_introduction_2022]. One simple but underused solution is standardizing the variables *before* forming the product term [@friedrich_defense_1982]. - The confidence intervals are formed using - the ordinary least squares (OLS), which does not + ordinary least squares (OLS), which does not take into account the sampling variation of the standardizers (the sample standard deviations used in standardization) and @@ -210,7 +212,8 @@ problems: are usually represented by dummy variables, each of them having only two possible values (0 or 1). It is not meaningful - to standardize the dummy variables. + to standardize the dummy variables + [@darlington_regression_2016]. # Beta-Select by `lm_betaselect()` @@ -229,7 +232,7 @@ to solve these problems by: into account selected standardization. We call the coefficients computed by -this kind of standardization *beta*s-Select +this kind of standardization *beta*s-select ($\beta{s}_{Select}$, $\beta_{Select}$ in singular form), to differentiate them from coefficients @@ -248,11 +251,11 @@ are standardized. ``` r lm_beta_select <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, - data = data_test_mod_cat2, - do_boot = FALSE) + data = data_test_mod_cat2, + do_boot = FALSE) ``` -The function `lm_beta_iv_mod()` can be +The function `lm_betaselect()` can be used as `lm()`, with applicable arguments such as the model formula and `data` passed to `lm()`. @@ -267,7 +270,7 @@ string variables) will not be standardized. Bootstrapping is done by default. In this illustration, `do_boot = FALSE` is added -to disabled it because we only want to +to disable it because we only want to address the first problem. We will do bootstrapping when addressing the issue with confidence intervals. @@ -331,7 +334,7 @@ term standardized, the coefficient of 0.145. As shown by @cheung_improving_2022, the coefficient of *standardized* product term (`iv:mod`) -can be a severely biased estimate of the +can be substantially different from the properly standardized product term (the product of standardized `iv` and standardized `mod`). @@ -340,11 +343,15 @@ standardized `mod`). Suppose we want to address both the first and the second problems, -with the product -term computed after `iv`, `mod`, and `dv` are -standardized and bootstrap confidence -interval used, we can call `lm_betaselect()` -again, with additional arguments +with + +- the product term computed after `iv`, + `mod`, and `dv` are standardized, and + +- bootstrap confidence interval used. + +We can call `lm_betaselect()` again, with +additional arguments set: @@ -362,13 +369,12 @@ These are the additional arguments: be set to 5000 or even 10000. - `iseed`: The seed for the random number - generator used for bootstrapping. Set + generator used in bootstrapping. Set this to an integer to make the results reproducible. - This is the output of `summary()` @@ -421,7 +427,7 @@ By default, 95% percentile bootstrap confidence intervals are printed (`CI.Lower` and `CI.Upper`). The *p*-values (`Pr(Boot)`) are asymmetric bootstrap -*p*-values. +*p*-values [@asparouhov_bootstrap_2021]. ## Estimates and Bootstrap Confidence Intervals, With Only Selected Variables Standardized @@ -432,13 +438,13 @@ done using either `to_standardize` or `not_to_standardize`. - Use `to_standardize` when -the variables to standardize -is much fewer than the variables +the number of variables to standardize +is much fewer than number of the variables not to standardize. - Use `not_to_standardize` -when the variables to standardize -is much more than the +when the number of variables to standardize +is much more than the number of variables not to standardize. For example, suppose we only @@ -532,11 +538,7 @@ This can be done in table notes. When calling `lm_betaselect()`, categorical variables (factors and -string variables) will not be standardized -by default. -This can be overriden by setting -`skip_categorical_x` to `FALSE`, though -not recommended. +string variables) will never be standardized. In the example above, the coefficients of the two dummy variables when both @@ -562,7 +564,7 @@ printCoefmat(lm_std_common_summary$coefficients[5:6, ], These two values are not interpretable because it does not make sense to talk -about a one-SD change in the dummy variables. +about a "one-SD change" in the dummy variables. The *beta*s-*Select* of the dummy variables, with only the outcome variable standardized, diff --git a/vignettes/betaselectr_lm.Rmd.original b/vignettes/betaselectr_lm.Rmd.original index ac1e044..37e4c9b 100644 --- a/vignettes/betaselectr_lm.Rmd.original +++ b/vignettes/betaselectr_lm.Rmd.original @@ -30,7 +30,7 @@ format_str <- function(x, digits = 3) { This article demonstrates how to use `lm_betaselect()` from the package -`betaselectr` +[`betaselectr`](https://sfcheung.github.io/betaselectr/) to standardize selected variables in a model fitted by `lm()` and forming confidence @@ -118,14 +118,16 @@ printCoefmat(lm_std_common_summary$coefficients, However, for this model, there are several problems: -- The product term, `iv:mod`, is also - standardized. This is inappropriate. +- The product term is also + standardized (`iv_x_mod`, computed + using the standard deviations of + `dv` and `iv:mod`). This is inappropriate [@hayes_introduction_2022]. One simple but underused solution is standardizing the variables *before* forming the product term [@friedrich_defense_1982]. - The confidence intervals are formed using - the ordinary least squares (OLS), which does not + ordinary least squares (OLS), which does not take into account the sampling variation of the standardizers (the sample standard deviations used in standardization) and @@ -152,7 +154,8 @@ problems: are usually represented by dummy variables, each of them having only two possible values (0 or 1). It is not meaningful - to standardize the dummy variables. + to standardize the dummy variables + [@darlington_regression_2016]. # Beta-Select by `lm_betaselect()` @@ -171,7 +174,7 @@ to solve these problems by: into account selected standardization. We call the coefficients computed by -this kind of standardization *beta*s-Select +this kind of standardization *beta*s-select ($\beta{s}_{Select}$, $\beta_{Select}$ in singular form), to differentiate them from coefficients @@ -189,11 +192,11 @@ are standardized. ```{r, results = FALSE} lm_beta_select <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, - data = data_test_mod_cat2, - do_boot = FALSE) + data = data_test_mod_cat2, + do_boot = FALSE) ``` -The function `lm_beta_iv_mod()` can be +The function `lm_betaselect()` can be used as `lm()`, with applicable arguments such as the model formula and `data` passed to `lm()`. @@ -208,7 +211,7 @@ string variables) will not be standardized. Bootstrapping is done by default. In this illustration, `do_boot = FALSE` is added -to disabled it because we only want to +to disable it because we only want to address the first problem. We will do bootstrapping when addressing the issue with confidence intervals. @@ -235,7 +238,7 @@ term standardized, the coefficient of `r format_str(b_select["iv:mod"])`. As shown by @cheung_improving_2022, the coefficient of *standardized* product term (`iv:mod`) -can be a severely biased estimate of the +can be substantially different from the properly standardized product term (the product of standardized `iv` and standardized `mod`). @@ -244,11 +247,15 @@ standardized `mod`). Suppose we want to address both the first and the second problems, -with the product -term computed after `iv`, `mod`, and `dv` are -standardized and bootstrap confidence -interval used, we can call `lm_betaselect()` -again, with additional arguments +with + +- the product term computed after `iv`, + `mod`, and `dv` are standardized, and + +- bootstrap confidence interval used. + +We can call `lm_betaselect()` again, with +additional arguments set: ```{r results = FALSE} @@ -265,11 +272,10 @@ These are the additional arguments: be set to 5000 or even 10000. - `iseed`: The seed for the random number - generator used for bootstrapping. Set + generator used in bootstrapping. Set this to an integer to make the results reproducible. - ```{r, echo = FALSE} tmp <- capture.output(suppressWarnings(print(summary(lm_beta_select_boot)))) ``` @@ -284,7 +290,7 @@ By default, 95% percentile bootstrap confidence intervals are printed (`CI.Lower` and `CI.Upper`). The *p*-values (`Pr(Boot)`) are asymmetric bootstrap -*p*-values. +*p*-values [@asparouhov_bootstrap_2021]. ## Estimates and Bootstrap Confidence Intervals, With Only Selected Variables Standardized @@ -295,13 +301,13 @@ done using either `to_standardize` or `not_to_standardize`. - Use `to_standardize` when -the variables to standardize -is much fewer than the variables +the number of variables to standardize +is much fewer than number of the variables not to standardize. - Use `not_to_standardize` -when the variables to standardize -is much more than the +when the number of variables to standardize +is much more than the number of variables not to standardize. For example, suppose we only @@ -350,11 +356,7 @@ This can be done in table notes. When calling `lm_betaselect()`, categorical variables (factors and -string variables) will not be standardized -by default. -This can be overriden by setting -`skip_categorical_x` to `FALSE`, though -not recommended. +string variables) will never be standardized. In the example above, the coefficients of the two dummy variables when both @@ -374,7 +376,7 @@ printCoefmat(lm_std_common_summary$coefficients[5:6, ], These two values are not interpretable because it does not make sense to talk -about a one-SD change in the dummy variables. +about a "one-SD change" in the dummy variables. The *beta*s-*Select* of the dummy variables, with only the outcome variable standardized, diff --git a/vignettes/references.bib b/vignettes/references.bib index 2cf988f..dff1fc3 100644 --- a/vignettes/references.bib +++ b/vignettes/references.bib @@ -1,4 +1,61 @@ +@article{menard_six_2004, + title = {Six approaches to calculating standardized logistic regression coefficients}, + volume = {58}, + issn = {0003-1305}, + url = {https://doi.org/10.1198/000313004X946}, + doi = {10.1198/000313004X946}, + abstract = {This article reviews six alternative approaches to constructing standardized logistic regression coefficients. The least attractive of the options is the one currently most readily available in logistic regression software, the unstandardized coefficient divided by its standard error (which is actually the normal distribution version of the Wald statistic). One alternative has the advantage of simplicity, while a slightly more complex alternative most closely parallels the standardized coefficient in ordinary least squares regression, in the sense of being based on variance in the dependent variable and the predictors. The sixth alternative, based on information theory, may be the best from a conceptual standpoint, but unless and until appropriate algorithms are constructed to simplify its calculation, its use is limited to relatively simple logistic regression models in practical application.}, + number = {3}, + urldate = {2024-10-27}, + journal = {The American Statistician}, + author = {Menard, Scott}, + month = aug, + year = {2004}, + note = {Publisher: ASA Website +\_eprint: https://doi.org/10.1198/000313004X946}, + keywords = {Logistic Regression, Standardized Solution}, + pages = {218--223}, +} + + +@misc{asparouhov_bootstrap_2021, + title = {Bootstrap \textit{p}-value computation}, + url = {https://www.statmodel.com/download/FAQ-Bootstrap%20-%20Pvalue.pdf}, + author = {Asparouhov, Tihomir and Muthén, Bengt O.}, + year = {2021}, + keywords = {Bootstrapping, Top Among Top, Mplus, P-value}, +} + +@book{hayes_introduction_2022, + address = {New York, NY}, + edition = {Third edition}, + series = {Methodology in the social sciences}, + title = {Introduction to mediation, moderation, and conditional process analysis: {A} regression-based approach}, + isbn = {978-1-4625-4903-0}, + shorttitle = {Introduction to mediation, moderation, and conditional process analysis}, + abstract = {"Lauded for its easy-to-understand, conversational discussion of the fundamentals of mediation, moderation, and conditional process analysis, this book has been fully revised with 50\% new content, including sections on working with multicategorical antecedent variables, the use of PROCESS version 3 for SPSS and SAS for model estimation, and annotated PROCESS v3 outputs. Using the principles of ordinary least squares regression, Andrew F. Hayes carefully explains procedures for testing hypotheses about the conditions under and the mechanisms by which causal effects operate, as well as the moderation of such mechanisms. Hayes shows how to estimate and interpret direct, indirect, and conditional effects; probe and visualize interactions; test questions about moderated mediation; and report different types of analyses. Data for all the examples are available on the companion website ([ital]www.afhayes.com[/ital]), along with links to download PROCESS"--}, + publisher = {The Guilford Press}, + author = {Hayes, Andrew F.}, + year = {2022}, + keywords = {Textbook, Mediation, Moderation (Interaction), Moderated-Mediation and Mediated-Moderation, PROCESS (Hayes)}, +} + +@book{darlington_regression_2016, + address = {New York}, + edition = {Illustrated edition}, + title = {Regression analysis and linear models: {Concepts}, applications, and implementation}, + isbn = {978-1-4625-2113-5}, + shorttitle = {Regression {Analysis} and {Linear} {Models}}, + abstract = {Emphasizing conceptual understanding over mathematics, this user-friendly text introduces linear regression analysis to students and researchers across the social, behavioral, consumer, and health sciences. Coverage includes model construction and estimation, quantification and measurement of multivariate and partial associations, statistical control, group comparisons, moderation analysis, mediation and path analysis, and regression diagnostics, among other important topics. Engaging worked-through examples demonstrate each technique, accompanied by helpful advice and cautions. The use of SPSS, SAS, and STATA is emphasized, with an appendix on regression analysis using R. The companion website (www.afhayes.com) provides datasets for the book's examples as well as the RLM macro for SPSS and SAS. Pedagogical Features: *Chapters include SPSS, SAS, or STATA code pertinent to the analyses described, with each distinctively formatted for easy identification. *An appendix documents the RLM macro, which facilitates computations for estimating and probing interactions, dominance analysis, heteroscedasticity-consistent standard errors, and linear spline regression, among other analyses. *Students are guided to practice what they learn in each chapter using datasets provided online. *Addresses topics not usually covered, such as ways to measure a variable’s importance, coding systems for representing categorical variables, causation, and myths about testing interaction.}, + language = {English}, + publisher = {The Guilford Press}, + author = {Darlington, Richard B. and Hayes, Andrew F.}, + month = sep, + year = {2016}, + keywords = {Regression, Textbook}, +} + @article{falk_are_2018, title = {Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation modeling?}, volume = {25}, From 5a2c5b213ab30e95713ebb066ded8c758be36519 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 20:50:52 +0800 Subject: [PATCH 02/11] Change the pkgdown style --- _pkgdown.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_pkgdown.yml b/_pkgdown.yml index 4ca3000..70b95b9 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -4,7 +4,7 @@ url: https://sfcheung.github.io/betaselectr/ template: bootstrap: 5 - bootswatch: yeti + bootswatch: united theme: a11y-light bslib: # pkgdown-nav-height: 100px @@ -12,8 +12,8 @@ template: fg: "#000000" primary: "#6478FF" link-color: "#0000A0" - base_font: {google: "PT Serif"} - heading_font: {google: "PT Serif"} + base_font: {google: "Public Sans"} + heading_font: {google: "Public Sans"} code_font: {google: "Source Code Pro"} includes: in_header: From 7ccc79f04a67d6af9260d33e8b6f0d8b30a56352 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 20:53:29 +0800 Subject: [PATCH 03/11] Update README.md --- README.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 6b185ab..e65a261 100644 --- a/README.md +++ b/README.md @@ -14,11 +14,13 @@ Not ready for use. (Version 0.0.1.17, updated on 2024-10-30, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) -It computes Beta_Select, standardization -in structural equation models with only +It computes *beta*-select, standardization +in structural equation models and +regression models with only selected variables standardized. It -supports models with moderation, as well -as regression models. It can form +supports models with moderation, with +product terms formed appropriately +(formed *after* standardization). It can also form confidence intervals that takes into account the standardization appropriately. @@ -38,3 +40,5 @@ work-in-progress. Not ready for use. If you have any suggestions and found any bugs, please feel feel to open a GitHub issue. Thanks. + +https://github.com/sfcheung/betaselectr/issues \ No newline at end of file From 0c7674b527ce39b0e2a2c7afa23710c15f5a7794 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 20:56:20 +0800 Subject: [PATCH 04/11] Update the categorization of functions --- _pkgdown.yml | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/_pkgdown.yml b/_pkgdown.yml index 70b95b9..9f57779 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -56,13 +56,17 @@ reference: # Section description. - contents: - std_data -- title: Methods +- title: Methods for lav_betaselect() # desc: > # Section description. - contents: - print.lav_betaselect - coef.lav_betaselect - confint.lav_betaselect +- title: Methods for lm_betaselect() and glm_betaselect() + # desc: > + # Section description. +- contents: - anova.lm_betaselect - coef.lm_betaselect - confint.lm_betaselect From 9ce13aa00d050fbab03b287ca2b263f24895f79b Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 21:41:32 +0800 Subject: [PATCH 05/11] Proofread the doc --- R/coef_lav_betaselect.R | 2 +- R/confint_lav_betaselect.R | 11 +++++--- R/lav_betaselect.R | 49 +++++++++++++++++++++-------------- R/lm_betaselect.R | 12 ++++----- R/lm_betaselect_methods.R | 12 ++++----- R/print_std_selected_lavaan.R | 21 ++++++++------- 6 files changed, 60 insertions(+), 47 deletions(-) diff --git a/R/coef_lav_betaselect.R b/R/coef_lav_betaselect.R index d24b57b..67f21b4 100644 --- a/R/coef_lav_betaselect.R +++ b/R/coef_lav_betaselect.R @@ -10,7 +10,7 @@ #' #' @return #' A numeric vector: The betas-select -#' in the object. Names of parameters +#' in the object. The names of parameters #' follow the convention in `lavaan`. #' #' @param object The output of diff --git a/R/confint_lav_betaselect.R b/R/confint_lav_betaselect.R index 291596d..93cd697 100644 --- a/R/confint_lav_betaselect.R +++ b/R/confint_lav_betaselect.R @@ -1,13 +1,16 @@ -#' @title Confidence Interval for a +#' @title Confidence Intervals for a #' 'lav_betaselect'-Class Object #' #' @description Return the confidence #' intervals of betas-select in the #' output of [lav_betaselect()]. #' -#' @details Details -#' (Include subjects for verbs.) -#' (Use 3rd person forms for verbs.) +#' @details +#' The type of +#' confidence intervals depends +#' on the call to [lav_betaselect()]. +#' This function does not recompute +#' the confidence interval. #' #' @return #' A two-column matrix of the confidence diff --git a/R/lav_betaselect.R b/R/lav_betaselect.R index 4567f5d..8848319 100644 --- a/R/lav_betaselect.R +++ b/R/lav_betaselect.R @@ -1,4 +1,4 @@ -#' @title Standardize Coefficients in a 'lavaan'-Model +#' @title Betas-Select in a 'lavaan'-Model #' #' @description Can standardize selected #' variables in a `lavaan` model without @@ -19,26 +19,28 @@ #' #' - It does not standardize product #' term, which is incorrect. Instead, -#' it compute the product term with -#' its component variables standardized. +#' it computes the product term with +#' its component variables standardized +#' first. #' #' - It can be used to generate bootstrap #' confidence intervals for the -#' standardized solution. Bootstrap +#' standardized solution (Falk, 2018). Bootstrap #' confidence interval is better than #' doing standardization *before* fitting #' a model because it correctly takes #' into account the sampling variance #' of the standard deviations. It is -#' also better than delta method +#' also better than delta-method #' confidence interval because it takes #' into account the usually asymmetric #' distribution of parameters after -#' standardization. +#' standardization, such as standardized +#' loadings and correlations. #' #' - For comparison, it can also report -#' delta method standard errors and -#' confidence intervals. +#' delta-method standard errors and +#' confidence intervals if requested. #' #' ## Problems With Common Approaches #' @@ -50,7 +52,7 @@ #' or misleading in these conditions: #' #' - Dummy variables are standardized -#' and can not be interpreted as the +#' and their coefficients cannot be interpreted as the #' difference between two groups on the #' outcome variables. #' @@ -66,14 +68,14 @@ #' they are standardized (e.g., age). #' #' Moreover, the delta method is usually -#' used, which is suboptimal for +#' used in standardization, which is suboptimal for #' standardization unless the sample -#' size is large. For example, the +#' size is large (Falk, 2018). For example, the #' covariance with variables standardized #' is a correlation, and its sampling #' distribution is skewed unless its #' population value is zero. However, -#' delta method confidence interval +#' delta-method confidence interval #' for the correlation is necessarily #' symmetric around the point estimate. #' @@ -92,7 +94,8 @@ #' - Intercepts not supported. #' #' @return -#' A data frame storing the parameter +#' A `lav_betaselect`-class object, +#' which is a data frame storing the parameter #' estimates, similar in form to the #' output of [lavaan::parameterEstimates()]. #' @@ -103,9 +106,9 @@ #' @param to_standardize A string vector, #' which should be the names of the #' variables to be standardized. -#' Default is `".all"`, indicating all +#' Default is `".all."`, indicating all #' variables are to be standardized -#' (but see `skip_categorical`). +#' (but see `skip_categorical_x`). #' #' @param not_to_standardize A string #' vector, which should be the names @@ -127,14 +130,15 @@ #' `to_standardize`. That is, a #' categorical predictor will not be #' standardized even if listed in -#' `to_standardize`, unless uses set +#' `to_standardize`, unless users set #' this argument to `FALSE`. #' #' @param output The format of the #' output. Not used because the format -#' of the print out is now controlled +#' of the printout is now controlled #' by the `print`-method of the output -#' of this function. +#' of this function. Kept for backward +#' compatibility. #' #' @param std_se String. If set to `"none"`, #' the default, standard errors will not @@ -193,7 +197,7 @@ #' output. #' #' @param delta_method The method used -#' to compute delta method standard +#' to compute delta-method standard #' errors. For internal use and should #' not be changed. #' @@ -308,6 +312,13 @@ #' (2022) Improving an old way to measure moderation effect in standardized #' units. *Health Psychology*, *41*(7), 502-505. #' \doi{10.1037/hea0001188} +#' +#' Falk, C. F. (2018). Are robust standard errors the best approach +#' for interval estimation with nonnormal data in structural equation +#' modeling? +#' *Structural Equation Modeling: A Multidisciplinary Journal, 25*(2) +#' 244-266. \doi{10.1080/10705511.2017.1367254} + #' #' @seealso [print.lav_betaselect()] for its print method. #' diff --git a/R/lm_betaselect.R b/R/lm_betaselect.R index bad876a..f8f2ac9 100644 --- a/R/lm_betaselect.R +++ b/R/lm_betaselect.R @@ -1,4 +1,4 @@ -#' @title Standardize Coefficients in a +#' @title Betas-Select in a #' Regression Model #' #' @description Can fit a linear regression @@ -63,7 +63,7 @@ #' or misleading in these conditions: #' #' - Dummy variables are standardized -#' and cannot be interpreted as the +#' and their coefficients cannot be interpreted as the #' difference between two groups on the #' outcome variables. #' @@ -226,8 +226,7 @@ #' which should be the names of the #' variables to be standardized. #' Default is `NULL`, indicating all -#' variables are to be standardized -#' (but see `skip_categorical`). +#' variables are to be standardized. #' #' @param not_to_standardize A string #' vector, which should be the names @@ -270,7 +269,7 @@ #' parallel processing will be used to #' do bootstrapping. Default is `FALSE` #' because bootstrapping for models fitted -#' by [lm()] or [glm()] is rarely slow. +#' by [stats::lm()] or [stats::glm()] is rarely slow. #' Actually, if both `parallel` and #' `progress` are set to `TRUE`, the #' speed may even be slower than serial @@ -546,7 +545,8 @@ glm_betaselect <- function(..., #' object. #' #' @details This is a helper functions -#' to be used by [lm_betaselect()]. It +#' to be used by [lm_betaselect()] +#' and [glm_betaselect()]. It #' assumes that the variables selected #' has been checked whether they are #' numeric. diff --git a/R/lm_betaselect_methods.R b/R/lm_betaselect_methods.R index 11520b3..f7ed40b 100644 --- a/R/lm_betaselect_methods.R +++ b/R/lm_betaselect_methods.R @@ -11,11 +11,11 @@ #' selected variables have been #' standardized. If requested, it can #' also return the regression -#' coefficients *without* +#' coefficients *before* #' standardization. #' #' @return -#' A scalar vector: The estimate of +#' A numeric vector: The estimate of #' regression coefficients. #' #' @param object The output of @@ -288,13 +288,11 @@ vcov.glm_betaselect <- vcov.lm_betaselect #' was requested, by default it returns #' the percentile bootstrap confidence #' intervals. Otherwise, it returns the -#' default confidence intervals -#' and raises a warning for the -#' standardized solution. +#' default confidence intervals. #' #' Support for other type of -#' confidence intervals will be -#' added. +#' confidence intervals may be +#' added in the future. #' #' @return #' A *p* by 2 matrix of the confidence diff --git a/R/print_std_selected_lavaan.R b/R/print_std_selected_lavaan.R index 1591dd9..a626c82 100644 --- a/R/print_std_selected_lavaan.R +++ b/R/print_std_selected_lavaan.R @@ -1,6 +1,6 @@ #' @title Print a 'lav_betaselect' Object #' -#' @description Print method for an +#' @description Print method for a #' 'lav_betaselect' object, which #' is the output of #' [lav_betaselect()]. @@ -23,8 +23,9 @@ #' which is compact but not easy to #' read. #' -#' @param x Object of the class -#' `std_solution_boot`. +#' @param x A `lav_betaselect`-class +#' object, such as the output of +#' [lav_betaselect()]. #' #' @param ... Optional arguments to be #' passed to [print()] methods. @@ -47,10 +48,10 @@ #' with `output` set to `"data.frame"`. #' #' @param standardized_only Logical. -#' If `TRUE`, only the +#' If `TRUE`, the default, only the #' results for the standardized solution #' will be printed. If `FALSE`, -#' the default, then +#' then #' the standardized solution is printed #' alongside the unstandardized solution, #' as in the printout of the output @@ -61,21 +62,21 @@ #' and `output` is `"lavaan.printer"`, then the #' column `"Bs.by"` is shown, #' indicating, for each parameter, the -#' variables standardized. Otherwise, -#' this column is not shown if `output` +#' variables standardized. +#' This column is not shown if `output` #' is not `"lavaan.printer"`. #' #' @param by_group If `TRUE`, the #' default, and the model has more than -#' one groups, sections will be grouped +#' one group, sections will be grouped #' by groups first, as in the print #' out of `summary()` in `lavaan`. #' If `FALSE`, then the sections will #' be grouped by sections first. #' #' @param na_str The string to be used -#' for cells with `NA``. Default is -#' `" "``, a white space. +#' for cells with `NA`. Default is +#' `" "`, a whitespace. #' #' @param sig_stars If `TRUE`, the #' default, symbols such as asterisks From adac0863f5ec965e284550b315690469f70fca00 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 21:54:41 +0800 Subject: [PATCH 06/11] 0.0.1.18: Update doc and meta-data Tests, checks, and build_site() passed. --- DESCRIPTION | 31 ++++++++++++++++------ NEWS.md | 7 +++-- README.md | 2 +- man/coef.lav_betaselect.Rd | 2 +- man/coef.lm_betaselect.Rd | 4 +-- man/confint.lav_betaselect.Rd | 10 +++++--- man/confint.lm_betaselect.Rd | 8 +++--- man/lav_betaselect.Rd | 48 +++++++++++++++++++++-------------- man/lm_betaselect.Rd | 9 +++---- man/print.lav_betaselect.Rd | 20 ++++++++------- man/std_data.Rd | 3 ++- 11 files changed, 87 insertions(+), 57 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index c901d06..d42e00b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,17 +1,32 @@ Package: betaselectr -Title: Selective Standardization in Structural Equation Models -Version: 0.0.1.17 +Title: Betas-Select in Structural Equation Models and Linear Models +Version: 0.0.1.18 Authors@R: c(person(given = "Shu Fai", family = "Cheung", role = c("aut", "cre"), email = "shufai.cheung@gmail.com", - comment = c(ORCID = "0000-0002-9871-9448"))) -Description: It computes Beta_Select, standardization in - structural equation models with only selected variables - standardized. It supports models with moderation, as well - as regression models. It can form confidence intervals - that takes into account the standardization appropriately. + comment = c(ORCID = "0000-0002-9871-9448")), + person(given = "Rong Wei", + family = "Sun", + role = c("aut"), + comment = c(ORCID = "0000-0003-0034-1422")), + person(given = "Florbela", + family = "Chang", + role = c("aut"), + comment = c(ORCID = "0009-0003-9931-501X")), + person(given = "Sing-Hang", + family = "Cheung", + role = c("aut"), + comment = c(ORCID = "0000-0001-5182-0752"))) +Description: It computes betas-select, standardization in + structural equation models and regression models with only + selected variables standardized. It supports models with + moderation, with product terms formed after standardiztion. + It can also form confidence intervals that takes into account + the standardization appropriately, such as bootstrap + confidendce intervals proposed by Cheung, Cheung, Lau, Hui, + and Vong (2022) . License: GPL (>= 3) Encoding: UTF-8 Roxygen: list(markdown = TRUE) diff --git a/NEWS.md b/NEWS.md index d3d8945..1e1af05 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# betaselectr 0.0.1.17 +# betaselectr 0.0.1.18 - Added `lm_betaselect()` and related methods and helper functions. @@ -96,4 +96,7 @@ users. (0.0.1.16) - Updated the vignettes and the `pkgdown` - site. (0.0.1.17) \ No newline at end of file + site. (0.0.1.17) + +- Proofread the documentation and + update meta-data. (0.0.1.18) \ No newline at end of file diff --git a/README.md b/README.md index e65a261..eb44133 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Not ready for use. # betaselectr: Do selective standardization in structural equation models and regression models -(Version 0.0.1.17, updated on 2024-10-30, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) +(Version 0.0.1.18, updated on 2024-10-31, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) It computes *beta*-select, standardization in structural equation models and diff --git a/man/coef.lav_betaselect.Rd b/man/coef.lav_betaselect.Rd index f347d4a..01bf2c4 100644 --- a/man/coef.lav_betaselect.Rd +++ b/man/coef.lav_betaselect.Rd @@ -19,7 +19,7 @@ used.} } \value{ A numeric vector: The betas-select -in the object. Names of parameters +in the object. The names of parameters follow the convention in \code{lavaan}. } \description{ diff --git a/man/coef.lm_betaselect.Rd b/man/coef.lm_betaselect.Rd index 40c8cf1..4fabce7 100644 --- a/man/coef.lm_betaselect.Rd +++ b/man/coef.lm_betaselect.Rd @@ -45,7 +45,7 @@ Default is \code{"beta"}.} \item{...}{Other arguments. Ignored.} } \value{ -A scalar vector: The estimate of +A numeric vector: The estimate of regression coefficients. } \description{ @@ -60,7 +60,7 @@ regression coefficients \emph{after} the selected variables have been standardized. If requested, it can also return the regression -coefficients \emph{without} +coefficients \emph{before} standardization. } \examples{ diff --git a/man/confint.lav_betaselect.Rd b/man/confint.lav_betaselect.Rd index dd23d1a..3b674af 100644 --- a/man/confint.lav_betaselect.Rd +++ b/man/confint.lav_betaselect.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/confint_lav_betaselect.R \name{confint.lav_betaselect} \alias{confint.lav_betaselect} -\title{Confidence Interval for a +\title{Confidence Intervals for a 'lav_betaselect'-Class Object} \usage{ \method{confint}{lav_betaselect}(object, parm, level = 0.95, ...) @@ -33,9 +33,11 @@ intervals of betas-select in the output of \code{\link[=lav_betaselect]{lav_betaselect()}}. } \details{ -Details -(Include subjects for verbs.) -(Use 3rd person forms for verbs.) +The type of +confidence intervals depends +on the call to \code{\link[=lav_betaselect]{lav_betaselect()}}. +This function does not recompute +the confidence interval. } \examples{ diff --git a/man/confint.lm_betaselect.Rd b/man/confint.lm_betaselect.Rd index 5b93455..2ba4441 100644 --- a/man/confint.lm_betaselect.Rd +++ b/man/confint.lm_betaselect.Rd @@ -131,13 +131,11 @@ on the object. If bootstrapping was requested, by default it returns the percentile bootstrap confidence intervals. Otherwise, it returns the -default confidence intervals -and raises a warning for the -standardized solution. +default confidence intervals. Support for other type of -confidence intervals will be -added. +confidence intervals may be +added in the future. } \examples{ diff --git a/man/lav_betaselect.Rd b/man/lav_betaselect.Rd index 0654107..119e6cf 100644 --- a/man/lav_betaselect.Rd +++ b/man/lav_betaselect.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/lav_betaselect.R \name{lav_betaselect} \alias{lav_betaselect} -\title{Standardize Coefficients in a 'lavaan'-Model} +\title{Betas-Select in a 'lavaan'-Model} \usage{ lav_betaselect( object, @@ -37,9 +37,9 @@ as \code{\link[lavaan:sem]{lavaan::sem()}} and \code{\link[lavaan:cfa]{lavaan::c \item{to_standardize}{A string vector, which should be the names of the variables to be standardized. -Default is \code{".all"}, indicating all +Default is \code{".all."}, indicating all variables are to be standardized -(but see \code{skip_categorical}).} +(but see \code{skip_categorical_x}).} \item{not_to_standardize}{A string vector, which should be the names @@ -61,14 +61,15 @@ overrides the argument \code{to_standardize}. That is, a categorical predictor will not be standardized even if listed in -\code{to_standardize}, unless uses set +\code{to_standardize}, unless users set this argument to \code{FALSE}.} \item{output}{The format of the output. Not used because the format -of the print out is now controlled +of the printout is now controlled by the \code{print}-method of the output -of this function.} +of this function. Kept for backward +compatibility.} \item{std_se}{String. If set to \code{"none"}, the default, standard errors will not @@ -222,7 +223,7 @@ which will be use to generate the output.} \item{delta_method}{The method used -to compute delta method standard +to compute delta-method standard errors. For internal use and should not be changed.} @@ -232,7 +233,8 @@ solution. For internal use and should not be changed.} } \value{ -A data frame storing the parameter +A \code{lav_betaselect}-class object, +which is a data frame storing the parameter estimates, similar in form to the output of \code{\link[lavaan:parameterEstimates]{lavaan::parameterEstimates()}}. } @@ -256,24 +258,26 @@ which has only two unique values, assuming that they are dummy variables. \item It does not standardize product term, which is incorrect. Instead, -it compute the product term with -its component variables standardized. +it computes the product term with +its component variables standardized +first. \item It can be used to generate bootstrap confidence intervals for the -standardized solution. Bootstrap +standardized solution (Falk, 2018). Bootstrap confidence interval is better than doing standardization \emph{before} fitting a model because it correctly takes into account the sampling variance of the standard deviations. It is -also better than delta method +also better than delta-method confidence interval because it takes into account the usually asymmetric distribution of parameters after -standardization. +standardization, such as standardized +loadings and correlations. \item For comparison, it can also report -delta method standard errors and -confidence intervals. +delta-method standard errors and +confidence intervals if requested. } \subsection{Problems With Common Approaches}{ @@ -285,7 +289,7 @@ The solution may be uninterpretable or misleading in these conditions: \itemize{ \item Dummy variables are standardized -and can not be interpreted as the +and their coefficients cannot be interpreted as the difference between two groups on the outcome variables. \item Product terms (interaction terms) @@ -300,14 +304,14 @@ they are standardized (e.g., age). } Moreover, the delta method is usually -used, which is suboptimal for +used in standardization, which is suboptimal for standardization unless the sample -size is large. For example, the +size is large (Falk, 2018). For example, the covariance with variables standardized is a correlation, and its sampling distribution is skewed unless its population value is zero. However, -delta method confidence interval +delta-method confidence interval for the correlation is necessarily symmetric around the point estimate. } @@ -368,6 +372,12 @@ Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) Improving an old way to measure moderation effect in standardized units. \emph{Health Psychology}, \emph{41}(7), 502-505. \doi{10.1037/hea0001188} + +Falk, C. F. (2018). Are robust standard errors the best approach +for interval estimation with nonnormal data in structural equation +modeling? +\emph{Structural Equation Modeling: A Multidisciplinary Journal, 25}(2) +244-266. \doi{10.1080/10705511.2017.1367254} } \seealso{ \code{\link[=print.lav_betaselect]{print.lav_betaselect()}} for its print method. diff --git a/man/lm_betaselect.Rd b/man/lm_betaselect.Rd index 2b46b56..d586426 100644 --- a/man/lm_betaselect.Rd +++ b/man/lm_betaselect.Rd @@ -6,7 +6,7 @@ \alias{print.lm_betaselect} \alias{print.glm_betaselect} \alias{raw_output} -\title{Standardize Coefficients in a +\title{Betas-Select in a Regression Model} \usage{ lm_betaselect( @@ -70,8 +70,7 @@ other methods.} which should be the names of the variables to be standardized. Default is \code{NULL}, indicating all -variables are to be standardized -(but see \code{skip_categorical}).} +variables are to be standardized.} \item{not_to_standardize}{A string vector, which should be the names @@ -114,7 +113,7 @@ generator. Default is \code{NULL}.} parallel processing will be used to do bootstrapping. Default is \code{FALSE} because bootstrapping for models fitted -by \code{\link[=lm]{lm()}} or \code{\link[=glm]{glm()}} is rarely slow. +by \code{\link[stats:lm]{stats::lm()}} or \code{\link[stats:glm]{stats::glm()}} is rarely slow. Actually, if both \code{parallel} and \code{progress} are set to \code{TRUE}, the speed may even be slower than serial @@ -241,7 +240,7 @@ The solution may be uninterpretable or misleading in these conditions: \itemize{ \item Dummy variables are standardized -and cannot be interpreted as the +and their coefficients cannot be interpreted as the difference between two groups on the outcome variables. \item Product terms (interaction terms) diff --git a/man/print.lav_betaselect.Rd b/man/print.lav_betaselect.Rd index 84a74b8..9b946c5 100644 --- a/man/print.lav_betaselect.Rd +++ b/man/print.lav_betaselect.Rd @@ -18,8 +18,9 @@ ) } \arguments{ -\item{x}{Object of the class -\code{std_solution_boot}.} +\item{x}{A \code{lav_betaselect}-class +object, such as the output of +\code{\link[=lav_betaselect]{lav_betaselect()}}.} \item{...}{Optional arguments to be passed to \code{\link[=print]{print()}} methods.} @@ -42,10 +43,10 @@ format similar to that of with \code{output} set to \code{"data.frame"}.} \item{standardized_only}{Logical. -If \code{TRUE}, only the +If \code{TRUE}, the default, only the results for the standardized solution will be printed. If \code{FALSE}, -the default, then +then the standardized solution is printed alongside the unstandardized solution, as in the printout of the output @@ -56,20 +57,21 @@ object.} and \code{output} is \code{"lavaan.printer"}, then the column \code{"Bs.by"} is shown, indicating, for each parameter, the -variables standardized. Otherwise, -this column is not shown if \code{output} +variables standardized. +This column is not shown if \code{output} is not \code{"lavaan.printer"}.} \item{by_group}{If \code{TRUE}, the default, and the model has more than -one groups, sections will be grouped +one group, sections will be grouped by groups first, as in the print out of \code{summary()} in \code{lavaan}. If \code{FALSE}, then the sections will be grouped by sections first.} \item{na_str}{The string to be used -for cells with \verb{NA``. Default is }" "``, a white space.} +for cells with \code{NA}. Default is +\code{" "}, a whitespace.} \item{sig_stars}{If \code{TRUE}, the default, symbols such as asterisks @@ -86,7 +88,7 @@ on its confidence interval.} \code{x} is returned invisibly. Called for its side effect. } \description{ -Print method for an +Print method for a 'lav_betaselect' object, which is the output of \code{\link[=lav_betaselect]{lav_betaselect()}}. diff --git a/man/std_data.Rd b/man/std_data.Rd index 330bcd4..13e43d7 100644 --- a/man/std_data.Rd +++ b/man/std_data.Rd @@ -25,7 +25,8 @@ object. } \details{ This is a helper functions -to be used by \code{\link[=lm_betaselect]{lm_betaselect()}}. It +to be used by \code{\link[=lm_betaselect]{lm_betaselect()}} +and \code{\link[=glm_betaselect]{glm_betaselect()}}. It assumes that the variables selected has been checked whether they are numeric. From bf43ec60bbe25201b653493839583ce7c1faad3b Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 22:15:20 +0800 Subject: [PATCH 07/11] Fix CRAN check issues --- DESCRIPTION | 4 ++-- R/print_std_selected_lavaan.R | 4 ++-- man/print.lav_betaselect.Rd | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index d42e00b..5a66e86 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -22,10 +22,10 @@ Authors@R: Description: It computes betas-select, standardization in structural equation models and regression models with only selected variables standardized. It supports models with - moderation, with product terms formed after standardiztion. + moderation, with product terms formed after standardization. It can also form confidence intervals that takes into account the standardization appropriately, such as bootstrap - confidendce intervals proposed by Cheung, Cheung, Lau, Hui, + confidence intervals proposed by Cheung, Cheung, Lau, Hui, and Vong (2022) . License: GPL (>= 3) Encoding: UTF-8 diff --git a/R/print_std_selected_lavaan.R b/R/print_std_selected_lavaan.R index a626c82..7bcc7d4 100644 --- a/R/print_std_selected_lavaan.R +++ b/R/print_std_selected_lavaan.R @@ -40,7 +40,7 @@ #' printed in a format similar to #' the printout of the output of #' the `summary`-method of a -#' `lavaan-class` object. +#' 'lavaan'-class object. #' If set to `"table"`, the results are #' printed in a table #' format similar to that of @@ -55,7 +55,7 @@ #' the standardized solution is printed #' alongside the unstandardized solution, #' as in the printout of the output -#' of [summary()] of a [lavaan-class] +#' of [summary()] of a 'lavaan'-class #' object. #' #' @param show_Bs.by Logical. If `TRUE` diff --git a/man/print.lav_betaselect.Rd b/man/print.lav_betaselect.Rd index 9b946c5..305e310 100644 --- a/man/print.lav_betaselect.Rd +++ b/man/print.lav_betaselect.Rd @@ -35,7 +35,7 @@ and the results will be printed in a format similar to the printout of the output of the \code{summary}-method of a -\code{lavaan-class} object. +'lavaan'-class object. If set to \code{"table"}, the results are printed in a table format similar to that of @@ -50,7 +50,7 @@ then the standardized solution is printed alongside the unstandardized solution, as in the printout of the output -of \code{\link[=summary]{summary()}} of a \linkS4class{lavaan} +of \code{\link[=summary]{summary()}} of a 'lavaan'-class object.} \item{show_Bs.by}{Logical. If \code{TRUE} From 0d807b067764ea12b64d69605f9ceeebdc0e6d29 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Thu, 31 Oct 2024 23:07:34 +0800 Subject: [PATCH 08/11] 0.0.1.19: Enable more tests --- DESCRIPTION | 2 +- NEWS.md | 6 +- README.md | 2 +- tests/testthat/test-lav_betaselect_confint.R | 1 - tests/testthat/test-lav_betaselect_tmp.R | 2 +- tests/testthat/test_glm_betaselect_expb.R | 6 +- tests/testthat/test_glm_betaselect_methods.R | 98 ++++++++++---------- 7 files changed, 57 insertions(+), 60 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 5a66e86..f4fa8d3 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: betaselectr Title: Betas-Select in Structural Equation Models and Linear Models -Version: 0.0.1.18 +Version: 0.0.1.19 Authors@R: c(person(given = "Shu Fai", family = "Cheung", diff --git a/NEWS.md b/NEWS.md index 1e1af05..a2d8f31 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# betaselectr 0.0.1.18 +# betaselectr 0.0.1.19 - Added `lm_betaselect()` and related methods and helper functions. @@ -99,4 +99,6 @@ site. (0.0.1.17) - Proofread the documentation and - update meta-data. (0.0.1.18) \ No newline at end of file + update meta-data. (0.0.1.18) + +- Enabled more tests. (0.0.1.19) \ No newline at end of file diff --git a/README.md b/README.md index eb44133..680be40 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Not ready for use. # betaselectr: Do selective standardization in structural equation models and regression models -(Version 0.0.1.18, updated on 2024-10-31, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) +(Version 0.0.1.19, updated on 2024-10-31, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) It computes *beta*-select, standardization in structural equation models and diff --git a/tests/testthat/test-lav_betaselect_confint.R b/tests/testthat/test-lav_betaselect_confint.R index cb0bf0e..ddccb48 100644 --- a/tests/testthat/test-lav_betaselect_confint.R +++ b/tests/testthat/test-lav_betaselect_confint.R @@ -1,4 +1,3 @@ -skip("WIP") skip_on_cran() library(testthat) diff --git a/tests/testthat/test-lav_betaselect_tmp.R b/tests/testthat/test-lav_betaselect_tmp.R index 70f7d7b..0074b54 100644 --- a/tests/testthat/test-lav_betaselect_tmp.R +++ b/tests/testthat/test-lav_betaselect_tmp.R @@ -1,4 +1,4 @@ -skip("WIP") +skip("Not used") #Load a test data of 500 cases data(test_modmed) diff --git a/tests/testthat/test_glm_betaselect_expb.R b/tests/testthat/test_glm_betaselect_expb.R index f8f1d93..d786fad 100644 --- a/tests/testthat/test_glm_betaselect_expb.R +++ b/tests/testthat/test_glm_betaselect_expb.R @@ -1,11 +1,9 @@ -skip("WIP") library(testthat) -dat <- data_test_mod_cat_binary test_that("transform b", { - lm_beta_x <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, dat, to_standardize = "iv", do_boot = FALSE, family = binomial) - lm_beta_x_boot <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, dat, to_standardize = "iv", do_boot = TRUE, bootstrap = 6, parallel = FALSE, iseed = 5678, progress = FALSE, family = binomial) + lm_beta_x <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) + lm_beta_x_boot <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = TRUE, bootstrap = 6, parallel = FALSE, iseed = 5678, progress = FALSE, family = binomial) tmp1 <- summary(lm_beta_x) tmp2 <- summary(lm_beta_x, transform_b = exp, transform_b_name = "Exp(B)") diff --git a/tests/testthat/test_glm_betaselect_methods.R b/tests/testthat/test_glm_betaselect_methods.R index 80d72e0..3c56e5d 100644 --- a/tests/testthat/test_glm_betaselect_methods.R +++ b/tests/testthat/test_glm_betaselect_methods.R @@ -1,16 +1,14 @@ -skip("WIP") - # Adapted from stdmod library(testthat) library(boot) -data(data_test_mod_cat) -data_test_mod_cat$dv <- ifelse(data_test_mod_cat$dv > mean(data_test_mod_cat$dv), - yes = 1, - no = 0) +data(data_test_mod_cat_binary) +# data_test_mod_cat$dv <- ifelse(data_test_mod_cat$dv > mean(data_test_mod_cat$dv), +# yes = 1, +# no = 0) -dat <- data_test_mod_cat +dat <- data_test_mod_cat_binary transform0 <- function(data, vars) { for (x in vars) { @@ -20,37 +18,37 @@ transform0 <- function(data, vars) { } -lm_raw <- glm(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, family = binomial) -lm_raw2 <- glm(dv ~ iv + mod + cov1 + cat1, data_test_mod_cat, family = binomial) -lm_zx <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat, c("iv")), family = binomial) -lm_zw <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat, c("mod")), family = binomial) -lm_zxzw <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat, c("iv", "mod")), family = binomial) -lm_zall <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat, c("iv", "mod", "cov1")), family = binomial) -lm_inline <- glm(dv ~ I(iv^2)*mod + I(1 / cov1) + cat1, transform0(data_test_mod_cat, c("iv", "mod", "cov1")), family = binomial) -lm_inline_raw <- glm(dv ~ I(iv^2)*mod + I(1 / cov1) + cat1, data_test_mod_cat, family = binomial) +lm_raw <- glm(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, family = binomial) +lm_raw2 <- glm(dv ~ iv + mod + cov1 + cat1, data_test_mod_cat_binary, family = binomial) +lm_zx <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat_binary, c("iv")), family = binomial) +lm_zw <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat_binary, c("mod")), family = binomial) +lm_zxzw <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat_binary, c("iv", "mod")), family = binomial) +lm_zall <- glm(dv ~ iv*mod + cov1 + cat1, transform0(data_test_mod_cat_binary, c("iv", "mod", "cov1")), family = binomial) +lm_inline <- glm(dv ~ I(iv^2)*mod + I(1 / cov1) + cat1, transform0(data_test_mod_cat_binary, c("iv", "mod", "cov1")), family = binomial) +lm_inline_raw <- glm(dv ~ I(iv^2)*mod + I(1 / cov1) + cat1, data_test_mod_cat_binary, family = binomial) -dat_tmp <- data_test_mod_cat +dat_tmp <- data_test_mod_cat_binary # dat_tmp$iv <- scale(dat$iv, scale = FALSE, center = TRUE)[, 1] # dat_tmp$mod <- scale(dat$mod, scale = sd(dat$mod), center = FALSE)[, 1] dat_tmp$iv <- scale(dat$iv)[, 1] lm_raw_x <- glm(dv ~ iv*mod + cov1 + cat1, dat_tmp, family = binomial) -lm_beta_x <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, dat, to_standardize = "iv", do_boot = FALSE, family = binomial) -lm_beta_x2 <- glm_betaselect(dv ~ iv + mod + cov1 + cat1, dat, to_standardize = "iv", do_boot = FALSE, family = binomial) -lm_beta_w <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, dat, to_standardize = "mod", do_boot = FALSE, family = binomial) -lm_beta_xw <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, dat, to_standardize = c("mod", "iv"), do_boot = FALSE, family = binomial) -lm_beta_inline <- glm_betaselect(dv ~ I(iv^2)*mod + I(1/ cov1) + cat1, dat, not_to_standardize = "dv", do_boot = FALSE, family = binomial) -lm_beta_xyw_boot <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, dat, not_to_standardize = "dv", do_boot = TRUE, bootstrap = 100, iseed = 1234, progress = FALSE, family = binomial) +lm_beta_x <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_x2 <- glm_betaselect(dv ~ iv + mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_w <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = "mod", do_boot = FALSE, family = binomial) +lm_beta_xw <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = c("mod", "iv"), do_boot = FALSE, family = binomial) +lm_beta_inline <- glm_betaselect(dv ~ I(iv^2)*mod + I(1/ cov1) + cat1, data_test_mod_cat_binary, not_to_standardize = "dv", do_boot = FALSE, family = binomial) +lm_beta_xyw_boot <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, not_to_standardize = "dv", do_boot = TRUE, bootstrap = 100, iseed = 1234, progress = FALSE, family = binomial) set.seed(1234) -n <- nrow(data_test_mod_cat) +n <- nrow(data_test_mod_cat_binary) i <- replicate(100, sample(n, size = n, replace = TRUE), simplify = FALSE) tmp <- sapply(i, function(xx) { - coef(glm(dv ~ iv*mod + cov1 + cat1, dat[xx, ], family = binomial)) + coef(glm(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary[xx, ], family = binomial)) }) vcov_raw_chk <- cov(t(tmp)) set.seed(1234) -lm_raw_boot <- boot(dat, +lm_raw_boot <- boot(data_test_mod_cat_binary, function(d, i) { coef(glm(dv ~ iv*mod + cov1 + cat1, d[i, ], family = binomial)) }, @@ -73,8 +71,8 @@ test_that("coef", { }) test_that("vcov", { - expect_warning(expect_warning(vcov(lm_beta_x), "changed"), - "should not") + # expect_warning(expect_warning(vcov(lm_beta_x), "changed"), + # "should not") expect_warning(vcov(lm_beta_x, method = "default")) expect_equal(vcov(lm_beta_xyw_boot, method = "boot"), vcov(lm_beta_xyw_boot)) @@ -86,8 +84,8 @@ test_that("vcov", { }) test_that("confint", { - expect_warning(expect_warning(suppressMessages(confint(lm_beta_x)), "changed"), - "should not") + # expect_warning(expect_warning(suppressMessages(confint(lm_beta_x)), "changed"), + # "should not") expect_warning(suppressMessages(confint(lm_beta_x, method = "ls"))) expect_equal(confint(lm_beta_xyw_boot, method = "boot", level = .80, parm = c("(Intercept)", "cat1gp2")), @@ -115,11 +113,11 @@ test_that("anova", { test_that("summary", { lm_beta_x_lm <- lm_beta_x class(lm_beta_x_lm) <- "glm" - expect_warning(summary(lm_beta_x), - "changed") - expect_equal(summary(lm_beta_x, type = "raw", se_method = "default")$coefficients, + # expect_warning(summary(lm_beta_x), + # "changed") + expect_equal(summary(lm_beta_x, type = "raw", se_method = "default", ci = FALSE)$coefficients, summary(lm_raw)$coefficients) - expect_equal(summary(lm_beta_x, type = "beta", se_method = "default")$coefficients, + expect_equal(summary(lm_beta_x, type = "beta", se_method = "default", ci = FALSE)$coefficients, summary(lm_beta_x_lm)$coefficients) expect_no_error(summary(lm_beta_xyw_boot)) expect_equal(summary(lm_beta_xyw_boot, ci = TRUE, level = .90)$coefficients[, 2:3], @@ -189,17 +187,17 @@ test_that("plot.lm", { test_that("predict", { expect_equal(predict(lm_beta_xw, model_type = "raw"), predict(lm_raw)) - dat_tmp3 <- data_test_mod_cat[10:20, ] + dat_tmp3 <- data_test_mod_cat_binary[10:20, ] dat_tmp3$iv <- scale(dat_tmp3$iv)[, 1] - expect_equal(predict(lm_beta_x, newdata = data_test_mod_cat[10:20, ]), + expect_equal(predict(lm_beta_x, newdata = data_test_mod_cat_binary[10:20, ]), predict(lm_raw_x, newdata = dat_tmp3)) }) # add1 -lm_raw_0 <- glm(dv ~ iv + mod, data_test_mod_cat, family = binomial) -lm_raw_1a <- glm(dv ~ iv + mod + cov1, data_test_mod_cat, family = binomial) -lm_raw_1b <- glm(dv ~ iv + mod + cat1, data_test_mod_cat, family = binomial) +lm_raw_0 <- glm(dv ~ iv + mod, data_test_mod_cat_binary, family = binomial) +lm_raw_1a <- glm(dv ~ iv + mod + cov1, data_test_mod_cat_binary, family = binomial) +lm_raw_1b <- glm(dv ~ iv + mod + cat1, data_test_mod_cat_binary, family = binomial) add1(lm_raw_0, ~ . + cov1 + cat1) extractAIC(lm_raw_1a) extractAIC(lm_raw_1b) @@ -210,7 +208,7 @@ drop1(lm_raw_1a, ~ cov1) extractAIC(lm_raw_0) # anova(lm_raw_0)["Residuals", "Sum Sq"] -dat_tmp2 <- data_test_mod_cat +dat_tmp2 <- data_test_mod_cat_binary dat_tmp2$iv <- scale(dat_tmp2$iv)[, 1] dat_tmp2$cov1 <- scale(dat_tmp2$cov1)[, 1] # dat_tmp2$dv <- scale(dat_tmp2$dv)[, 1] @@ -228,9 +226,9 @@ extractAIC(lm_beta_manual_0) # anova(lm_beta_manual_0)["(lm_beta_manual_0siduals", "Sum Sq"] test_that("add1() and drop1()", { - lm_beta_0 <- glm_betaselect(dv ~ iv + mod, data_test_mod_cat, to_standardize = c("iv", "cov1"), progress = FALSE, do_boot = FALSE, family = binomial) - lm_beta_1a <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat, to_standardize = c("iv", "cov1"), progress = FALSE, do_boot = FALSE, family = binomial) - lm_beta_1b <- glm_betaselect(dv ~ iv + mod + cat1, data_test_mod_cat, to_standardize = c("iv", "cov1"), progress = FALSE, do_boot = FALSE, family = binomial) + lm_beta_0 <- glm_betaselect(dv ~ iv + mod, data_test_mod_cat_binary, to_standardize = c("iv", "cov1"), progress = FALSE, do_boot = FALSE, family = binomial) + lm_beta_1a <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat_binary, to_standardize = c("iv", "cov1"), progress = FALSE, do_boot = FALSE, family = binomial) + lm_beta_1b <- glm_betaselect(dv ~ iv + mod + cat1, data_test_mod_cat_binary, to_standardize = c("iv", "cov1"), progress = FALSE, do_boot = FALSE, family = binomial) add1_out <- add1(lm_beta_0, ~ . + cov1 + cat1) expect_equal(add1_out["cov1", "AIC"], extractAIC(lm_beta_manual_1a)[2]) @@ -248,13 +246,13 @@ test_that("add1() and drop1()", { anova(lm_beta_manual_0)["Residuals", "Sum Sq"]) }) -lm_beta_u0 <- glm_betaselect(dv ~ iv, data_test_mod_cat, to_standardize = "iv", do_boot = FALSE, family = binomial) -lm_beta_u1 <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat, to_standardize = "iv", do_boot = FALSE, family = binomial) -lm_beta_u2 <- glm_betaselect(dv ~ iv*mod + cov1, data_test_mod_cat, to_standardize = "iv", do_boot = FALSE, family = binomial) -lm_beta_u3 <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, to_standardize = "iv", do_boot = FALSE, family = binomial) -lm_beta_u4 <- glm_betaselect(dv ~ iv + mod + cov1 + cat1, data_test_mod_cat, to_standardize = "iv", do_boot = FALSE, family = binomial) -lm_beta_u5 <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat, to_standardize = "mod", do_boot = FALSE, family = binomial) -lm_beta_u6 <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat[20:50, ], to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_u0 <- glm_betaselect(dv ~ iv, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_u1 <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_u2 <- glm_betaselect(dv ~ iv*mod + cov1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_u3 <- glm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_u4 <- glm_betaselect(dv ~ iv + mod + cov1 + cat1, data_test_mod_cat_binary, to_standardize = "iv", do_boot = FALSE, family = binomial) +lm_beta_u5 <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat_binary, to_standardize = "mod", do_boot = FALSE, family = binomial) +lm_beta_u6 <- glm_betaselect(dv ~ iv + mod + cov1, data_test_mod_cat_binary[20:50, ], to_standardize = "iv", do_boot = FALSE, family = binomial) test_that("getCall", { expect_equal(as.character(getCall(lm_beta_u1)[[1]])[3], @@ -286,7 +284,7 @@ test_that("update", { lm_beta_tmp <- update(lm_beta_u1, to_standardize = "mod") expect_equal(sort(coef(lm_beta_tmp)), sort(coef(lm_beta_u5))) - lm_beta_tmp <- update(lm_beta_u0, ~ . + mod + cov1, data = data_test_mod_cat[20:50, ]) + lm_beta_tmp <- update(lm_beta_u0, ~ . + mod + cov1, data = data_test_mod_cat_binary[20:50, ]) expect_equal(sort(coef(lm_beta_tmp)), sort(coef(lm_beta_u6))) }) From 8513768474e99fd341aca6a04c399bf1547f9518 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Sat, 2 Nov 2024 00:33:36 +0800 Subject: [PATCH 09/11] Disable a parallel test --- tests/testthat/test_lm_betaselect_boot.R | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/tests/testthat/test_lm_betaselect_boot.R b/tests/testthat/test_lm_betaselect_boot.R index 6970d4b..dd9585d 100644 --- a/tests/testthat/test_lm_betaselect_boot.R +++ b/tests/testthat/test_lm_betaselect_boot.R @@ -29,7 +29,7 @@ i <- replicate(6, sample(n, size = n, replace = TRUE), simplify = FALSE) dat_tmp <- dat_tmp[i[[5]], ] lm_beta_x <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, to_standardize = "iv", do_boot = TRUE, bootstrap = 6, iseed = 5678, progress = FALSE) -lm_beta_y <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, to_standardize = "dv", do_boot = TRUE, bootstrap = 6, iseed = 5678, progress = FALSE, parallel = TRUE, ncpus = 2) +# lm_beta_y <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, to_standardize = "dv", do_boot = TRUE, bootstrap = 6, iseed = 5678, progress = FALSE, parallel = TRUE, ncpus = 2) lm_beta_w <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, to_standardize = "mod", do_boot = TRUE, bootstrap = 6, iseed = 5678, progress = FALSE) lm_beta_xw <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, to_standardize = c("mod", "iv"), do_boot = TRUE, bootstrap = 6, iseed = 5678, progress = FALSE) lm_beta_yw <- lm_betaselect(dv ~ iv*mod + cov1 + cat1, data_test_mod_cat, to_standardize = c("mod", "dv"), do_boot = TRUE, bootstrap = 6, iseed = 5678, progress = FALSE) @@ -45,14 +45,14 @@ test_that("Standardize x", { ) }) -test_that("Standardize y", { - tmp1 <- lm_beta_y$lm_betaselect$boot_out[[5]]$coef_std - tmp2 <- coef(update(lm_zx, data = transform0(dat_tmp, c("dv")))) - expect_equal( - tmp1, tmp2, - ignore_attr = TRUE - ) - }) +# test_that("Standardize y", { +# tmp1 <- lm_beta_y$lm_betaselect$boot_out[[5]]$coef_std +# tmp2 <- coef(update(lm_zx, data = transform0(dat_tmp, c("dv")))) +# expect_equal( +# tmp1, tmp2, +# ignore_attr = TRUE +# ) +# }) test_that("Standardize w", { tmp1 <- lm_beta_w$lm_betaselect$boot_out[[5]]$coef_std From 9f34050b916cccf49671cd103785fab66a96ef9a Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Sat, 2 Nov 2024 00:33:58 +0800 Subject: [PATCH 10/11] 0.0.2.0: Prepare for CRAN Tests, checks, and build_site() passed. --- DESCRIPTION | 6 +++++- NEWS.md | 4 +++- README.md | 2 +- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index f4fa8d3..656622d 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: betaselectr Title: Betas-Select in Structural Equation Models and Linear Models -Version: 0.0.1.19 +Version: 0.0.2.0 Authors@R: c(person(given = "Shu Fai", family = "Cheung", @@ -15,6 +15,10 @@ Authors@R: family = "Chang", role = c("aut"), comment = c(ORCID = "0009-0003-9931-501X")), + person(given = "Wendie", + family ="Yang", + role = c("ctb"), + comment = c(ORCID = "0009-0000-8388-6481")), person(given = "Sing-Hang", family = "Cheung", role = c("aut"), diff --git a/NEWS.md b/NEWS.md index a2d8f31..a2424ee 100644 --- a/NEWS.md +++ b/NEWS.md @@ -101,4 +101,6 @@ - Proofread the documentation and update meta-data. (0.0.1.18) -- Enabled more tests. (0.0.1.19) \ No newline at end of file +- Enabled more tests. (0.0.1.19) + +- Prepare for CRAN. (0.0.2.0) \ No newline at end of file diff --git a/README.md b/README.md index 680be40..f9178cc 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Not ready for use. # betaselectr: Do selective standardization in structural equation models and regression models -(Version 0.0.1.19, updated on 2024-10-31, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) +(Version 0.0.2.0, updated on 2024-11-02, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) It computes *beta*-select, standardization in structural equation models and From 67ddd4d5e1168029eb51f885cf444cdbf10e5335 Mon Sep 17 00:00:00 2001 From: Shu Fai Cheung Date: Sat, 2 Nov 2024 00:38:06 +0800 Subject: [PATCH 11/11] Update to active and add installation instruction --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index f9178cc..5c05622 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,12 @@ [![Lifecycle: stable](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) -[![Project Status: WIP - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip) +[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Code size](https://img.shields.io/github/languages/code-size/sfcheung/betaselectr.svg)](https://github.com/sfcheung/betaselectr) [![Last Commit at Main](https://img.shields.io/github/last-commit/sfcheung/betaselectr.svg)](https://github.com/sfcheung/betaselectr/commits/main) [![R-CMD-check](https://github.com/sfcheung/betaselectr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/sfcheung/betaselectr/actions/workflows/R-CMD-check.yaml) -**IMPORTANT**: It is a work-in-progress. -Not ready for use. - # betaselectr: Do selective standardization in structural equation models and regression models (Version 0.0.2.0, updated on 2024-11-02, [release history](https://sfcheung.github.io/betaselectr/news/index.html)) @@ -32,8 +29,12 @@ https://sfcheung.github.io/betaselectr/ # Installation -**DO NOT INSTALL**: It is a -work-in-progress. Not ready for use. +The latest developmental version of this +package can be installed by `remotes::install_github`: + +```r +remotes::install_github("sfcheung/betaselectr") +``` # Issues