Rationale for Box Cox Transformation #27

cyhsieh-psy · 2023-09-13T13:19:08Z

cyhsieh-psy
Sep 13, 2023

If there any basic background knowledge about Box Cox transformation that we should know as a researcher? Are there any references recommended as we try to justify the transformation using that package?

palday · 2023-09-13T13:27:47Z

palday
Sep 13, 2023
Maintainer

Read the package documentation and check out the original Box-Cox paper. There are also lots of good teaching examples available online if you search a little. If you find one that you really like, please post it here so that other students can find it as well!

Here's the first link to the package documentation: https://palday.github.io/BoxCox.jl/stable/mixed-models/

0 replies

kliegl · 2023-09-13T13:53:47Z

kliegl
Sep 13, 2023
Collaborator

This is a bit dated (i.e., from Kliegl, Masson, & Richter; 2009, Visual Cognition, 676-678), but we can use it as a start for a conversation.

To transform or not to transform?

Invariably, questions are raised about the justification for choosing a transformation of the dependent variable. This issue has been addressed many times in the psychological research literature with not much impact. An exception is that, compared to 20 years ago, it seems that logarithmic transformation of RTs needs no justification. One reason probably is that, for the assessment of fixed effects, it rarely matters. Our analyses support the assumption that this kind of tacit knowledge guides individual decisions about the transformation question. Irrespective of whether we look at untransformed, log-transformed, or reciprocal RTs, we consistently obtain significant main effects of priming, frequency, and visual familiarity. Matters change substantially, however, when we look at the correlations of individual differences or estimates of such correlations between these effects. Transformation may not influence the pattern of differences between means, but it can drastically alter the pattern of correlations between effects or between effects and mean response speed.

So which metric is the correct one? Sometimes psychologists do not use transformations such as those suggested by the Box-Cox procedure because they perceive RTs as the natural metric. In the linear model, coefficients reflect the additional time due to an experimental effect; that is, the time it takes for a hypothetical cognitive process to finish. Thus, they give priority to additivity of time and attempt to explain the general positive skew of RT distributions and their heteroscedasticity across conditions as a consequence of internal information processing (e.g., Logan, 1992; Wagenmakers & Brown, 2007).
In contrast, effects estimated in a transformed RT metric may not have an obvious interpretation. There is some force to this argument, but at least the two transformations considered in this paper do have psychologically plausible interpretations: 1/RT leads to an interpretation of coefficients as additive changes in processing rate possibly tied to [, for example, speed] or neural spike rates (e.g., Carpenter, 1981; Carpenter & Williams, 1995) and coefficients estimated from log(RT) inform about the size and reliability of standard RT effects in multiplicative, rather than additive, terms. Thus, we can also develop models with these metrics.

The general problem, however, is that, if the linear model is to be used for statistical inference, then it simply does not make sense to work with a yardstick for which the precision of measurement changes with the size of the object to be measured. The only generally applicable (i.e., independent of specific content domains) and meaningful estimate of the precision of our measurement scale is the standard deviation. Therefore, if statistical inference is intended for fixed and random experimental effects, one solution is to transform one’s scale such that the same standard deviation holds across the entire range, a characteristic that often does not hold when untransformed RT data are considered (Wagenmakers & Brown, 2007). The reciprocal transformation appears to achieve this goal for the RT data considered here; for other data sets, logarithmic, square-root, or no transformations may be called for.

Nevertheless some theoretical constructs make perfect sense in one metric, but not in another. So, can’t we have our cake and eat it, too? The standard linear model requires a normally distributed measure, but RTs obviously do not have this property. They appear, however, to be well described, for example, by lognormal or gamma distributions. If one is theoretically committed to such a distribution (e.g., the SWIFT model of eye movement control in reading randomly samples the starting times of saccade programmes from a gamma distribution; Engbert, Nuthmann, Richter, & Kliegl, 2005), then an elegant solution, one that preserves interpretation in the standard RT metric, is to switch from the linear mixed model to a generalized linear mixed model (GLMM) for statistical inference. The disadvantage associated with this approach is that estimation of GLMM coefficients (in particular from crossed-random effects models) must be numerically approximated rather than computed from a closed-form solution (Bates, 2008a). Also the interpretation of coefficients is less straightforward for GLMM than for LMM.

There are also recent developments to estimate crossed-random effects of subjects and items in a Bayesian framework (Rouder, Lu, Speckman, Sun, & Jiang, 2005). This approach opens the way to use distributions outside the exponential family. It also allows other than a normal parent distribution for the parameter estimates. For example, Rouder et al. (2005) showed that RTs for symbolic distance effects (e.g., judging the difference between numeri- cally adjacent and nonadjacent digits) are best described with a three- parameter Weibull distribution, assuming also that the parameters them- selves are gamma distributed. Their Bayesian estimation of a hierarchical Weibull model also represents an alternative to maximum likelihood estimation; simulations prove the Bayesian approach to be superior to eight alternative estimation methods for Weibull parameters at the individual or group level. The advantage becomes especially striking when simulations are based on only 20 rather than 80 observations per subject. Finally, Rouder et al. point out that their approach can be expanded to achieve what has been described in this paper: The simultaneous estimation of variance/covariance component parameters for subject and items.

In conclusion, the routine application of Bayesian techniques will still take a few years to take hold in experimental psychology. In the mean time, we propose to spend one degree of freedom for a transformation that maps the observed data into a representation that is compatible with the statistical model we use for inferences or prediction or to use GLMM to this end. Typically, either approach will yield normally distributed residuals. In this measurement space, we interpret not only the fixed effects of our experi- mental design but, in a single sweep, we can estimate how these effects vary and correlate among subjects and items. Experimental psychologists have collected much reliable, theoretically relevant information on subjects and items for many years. Perhaps the time has come to use it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale for Box Cox Transformation #27

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Rationale for Box Cox Transformation #27

cyhsieh-psy Sep 13, 2023

Replies: 2 comments

palday Sep 13, 2023 Maintainer

kliegl Sep 13, 2023 Collaborator

cyhsieh-psy
Sep 13, 2023

palday
Sep 13, 2023
Maintainer

kliegl
Sep 13, 2023
Collaborator