Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
palday committed Sep 11, 2024
1 parent 512277c commit 9c04da2
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 8 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
b2dfdda6
54842272
2 changes: 1 addition & 1 deletion search.json
Original file line number Diff line number Diff line change
Expand Up @@ -1028,7 +1028,7 @@
"href": "transformation.html#response",
"title": "Transformations of the predictors and the response",
"section": "2 Response",
"text": "2 Response\nIn addition to transforming the predictors, we can also consider transforming the response (dependent variable). There are many different common possibilities – the log, the inverse/reciprocal, or even the square root – and it can be difficult to choose an appropriate one. For non-negative response (e.g., reaction time in many experiences), Box & Cox (1964) figured out a generalization that subsumes all of these possibilities:\n\\[\n\\begin{cases}\n\\frac{y^{\\lambda} - 1}{\\lambda} &\\quad \\lambda \\neq 0 \\\\\n\\log y &\\quad \\lambda = 0\n\\end{cases}\n\\]\nOur task is thus finding the appropriate \\(\\lambda\\) such that the conditional distribution is as normal as possible. In other words, we need to find \\(\\lambda\\) that results in the residuals are as normal as possible. I’ve emphasized conditional distribution and residuals because that’s where the normality assumption actually lies in the linear (mixed) model. The assumption is not that the response y, i.e. the uncondidtional distribution, is normally distributed, but rather that the residuals are normally distributed. In other words, we can only check the quality of a given \\(\\lambda\\) by fitting a model to the transformed response. Fortunately, BoxCox.jl makes this easy.\nThe fit function takes two arguments: - the transformation to be fit (i.e. BoxCoxTransformation) - the model fit to the original data\n\nusing BoxCox\nbc = fit(BoxCoxTransformation, slp)\n\nBox-Cox transformation\n\nestimated λ: -1.0747\nresultant transformation:\n\n y^-1.1 - 1\n------------\n -1.1\n\n\n\nFor large models, fitting the BoxCoxTransformation can take a while because a mixed model must be repeatedly fit after each intermediate transformation.\n\nAlthough we receive a single “best” value (approximately -1.0747) from the fitting process, it is worthwhile to look at the profile likelihood plot for the transformation:\n\n# we need a plotting backend loaded before we can use plotting functionality\n# from BoxCox.jl\nusing CairoMakie\nboxcoxplot(bc; conf_level=0.95)\n\n\n\n\n\n\n\n\nHere we see that -1 is nearly as good. Moreover, time\\(^{-1}\\) has a natural interpretation as speed. In other words, we can model reaction speed instead of reaction time. Then instead of seeing whether participants take longer to respond with each passing day, we can see whether their speed increases or decreases. In both cases, we’re looking at whether they respond faster or slower and even the terminology fast and slow suggests that speed is easily interpretable.\nIf we recall the definition of the Box-Cox transformation from above: \\[\n\\begin{cases}\n\\frac{y^{\\lambda} - 1}{\\lambda} &\\quad \\lambda \\neq 0 \\\\\n\\log y &\\quad \\lambda = 0\n\\end{cases}\n\\]\nthen we see that there is a normalizing denominator that flips the sign when \\lambda < 0. If we use the full Box-Cox formula, then the sign of the effect in our transformed and untransformed model remains the same. While useful at times, speed has a natural interpretation and so we instead use the power relation, which is the actual key component, without normalization.\nBecause reaction is stored in milliseconds, we use 1000 / reaction instead of 1 / reaction so that our speed units are responses per second.\n\nmodel_bc = fit(MixedModel,\n @formula(1000 / reaction ~ 1 + days + (1 + days | subj)),\n dataset(:sleepstudy))\n\n\n\n\n\nEst.\nSE\nz\np\nσ_subj\n\n\n(Intercept)\n3.9658\n0.1056\n37.55\n<1e-99\n0.4190\n\n\ndays\n-0.1110\n0.0151\n-7.37\n<1e-12\n0.0566\n\n\nResidual\n0.2698\n\n\n\n\n\n\n\n\n\nFor our original model on the untransformed scale, the intercept was approximately 250, which means that the average response time was about 250 milliseconds. For the model on the speed scale, we have an intercept about approximately 4, which means that the average response speed is about 4 responses per second, which implies that the the average response time is 250 milliseconds. In other words, our new results are compatible with our previous estimates.\nThis example also makes something else clear: much like transformations of the predictors, transforming the response changes the hypothesis being tested. While it is relatively easy to re-formulate hypothesis about reaction time into hypotheses about speed, it can be harder to re-formulate other hypotheses. For example, a log transformation of the response changes the hypotheses on the original scale from additive effects to multiplicative effects. As a very simple example, consider two observations y1 = 100 and y2 = 1000. On the original scale, there y2 = 10 * y1. But on the \\(\\log_{10}\\) scale, log10(y2) = 1 + log10(y1). In other words: I recommend keeping interpretability of the model in mind before blindly chasing perfectly fulfilling all model assumptions.\nThere are two other little tricks that BoxCox.jl has to offer. First, the fitted transformation will work just like a function:\n\nbc(1000)\n\n0.9299202243766808\n\n\n\nbc.(response(slp))\n\n180-element Vector{Float64}:\n 0.9280071533109451\n 0.9281008004980732\n 0.9280202732761681\n 0.9285950348759796\n 0.9287948232963694\n 0.9290453817984637\n 0.9289143340819649\n 0.9283762261753863\n 0.9291020425612259\n 0.9292149261473703\n ⋮\n 0.9282383517103107\n 0.9284326403128913\n 0.9285246243376963\n 0.928352836088982\n 0.9286450699122083\n 0.9286737217944607\n 0.9287229751967703\n 0.9288548849797978\n 0.9288308689512543\n\n\nSecond, the decades since the publication of Box & Cox (1964) have seen many proposed extensions to handle that that may not be strictly positive. One such proposal from Yeo & Johnson (2000) is also implemented in BoxCox.jl. The definition of the transformation is:\n\\[\n\\begin{cases} ((x_i+1)^\\lambda-1)/\\lambda & \\text{if }\\lambda \\neq 0, y \\geq 0 \\\\\n \\log(y_i + 1) & \\text{if }\\lambda = 0, y \\geq 0 \\\\\n -((-x_i + 1)^{(2-\\lambda)} - 1) / (2 - \\lambda) & \\text{if }\\lambda \\neq 2, y < 0 \\\\\n -\\log(-x_i + 1) & \\text{if }\\lambda = 2, y < 0\n\\end{cases}\n\\]\nand we can fit it in BoxCox.jl with\n\nyj = fit(YeoJohnsonTransformation, slp)\n\nYeo-Johnson transformation\n\nestimated λ: -1.0700\np-value: <1e-09\n\nresultant transformation:\n\nFor y ≥ 0,\n\n (y + 1)^-1.1 - 1\n------------------\n -1.1\n\n\nFor y < 0:\n\n -((-y + 1)^(2 - -1.1) - 1)\n----------------------------\n (2 - -1.1)\n\n\n\nf = boxcoxplot(yj; conf_level=0.95)\nf[0, :] = Label(f, \"Yeo-Johnson\"; tellwidth=false)\nf",
"text": "2 Response\nIn addition to transforming the predictors, we can also consider transforming the response (dependent variable). There are many different common possibilities – the log, the inverse/reciprocal, or even the square root – and it can be difficult to choose an appropriate one. For non-negative response (e.g., reaction time in many experiences), Box & Cox (1964) figured out a generalization that subsumes all of these possibilities:\n\\[\n\\begin{cases}\n\\frac{y^{\\lambda} - 1}{\\lambda} &\\quad \\lambda \\neq 0 \\\\\n\\log y &\\quad \\lambda = 0\n\\end{cases}\n\\]\nOur task is thus finding the appropriate \\(\\lambda\\) such that the conditional distribution is as normal as possible. In other words, we need to find \\(\\lambda\\) that results in the residuals are as normal as possible. I’ve emphasized conditional distribution and residuals because that’s where the normality assumption actually lies in the linear (mixed) model. The assumption is not that the response y, i.e. the uncondidtional distribution, is normally distributed, but rather that the residuals are normally distributed. In other words, we can only check the quality of a given \\(\\lambda\\) by fitting a model to the transformed response. Fortunately, BoxCox.jl makes this easy.\nThe fit function takes two arguments: - the transformation to be fit (i.e. BoxCoxTransformation) - the model fit to the original data\n\nusing BoxCox\nbc = fit(BoxCoxTransformation, slp)\n\nBox-Cox transformation\n\nestimated λ: -1.0747\nresultant transformation:\n\n y^-1.1 - 1\n------------\n -1.1\n\n\n\nFor large models, fitting the BoxCoxTransformation can take a while because a mixed model must be repeatedly fit after each intermediate transformation.\n\nAlthough we receive a single “best” value (approximately -1.0747) from the fitting process, it is worthwhile to look at the profile likelihood plot for the transformation:\n\n# we need a plotting backend loaded before we can use plotting functionality\n# from BoxCox.jl\nusing CairoMakie\nboxcoxplot(bc; conf_level=0.95)\n\n\n\n\n\n\n\n\nHere we see that -1 is nearly as good. Moreover, time\\(^{-1}\\) has a natural interpretation as speed. In other words, we can model reaction speed instead of reaction time. Then instead of seeing whether participants take longer to respond with each passing day, we can see whether their speed increases or decreases. In both cases, we’re looking at whether they respond faster or slower and even the terminology fast and slow suggests that speed is easily interpretable.\nIf we recall the definition of the Box-Cox transformation from above: \\[\n\\begin{cases}\n\\frac{y^{\\lambda} - 1}{\\lambda} &\\quad \\lambda \\neq 0 \\\\\n\\log y &\\quad \\lambda = 0\n\\end{cases}\n\\]\nthen we see that there is a normalizing denominator that flips the sign when \\lambda < 0. If we use the full Box-Cox formula, then the sign of the effect in our transformed and untransformed model remains the same. While useful at times, speed has a natural interpretation and so we instead use the power relation, which is the actual key component, without normalization.\nBecause reaction is stored in milliseconds, we use 1000 / reaction instead of 1 / reaction so that our speed units are responses per second.\n\nmodel_bc = fit(MixedModel,\n @formula(1000 / reaction ~ 1 + days + (1 + days | subj)),\n dataset(:sleepstudy))\n\n\n\n\n\nEst.\nSE\nz\np\nσ_subj\n\n\n(Intercept)\n3.9658\n0.1056\n37.55\n<1e-99\n0.4190\n\n\ndays\n-0.1110\n0.0151\n-7.37\n<1e-12\n0.0566\n\n\nResidual\n0.2698\n\n\n\n\n\n\n\n\n\nFor our original model on the untransformed scale, the intercept was approximately 250, which means that the average response time was about 250 milliseconds. For the model on the speed scale, we have an intercept about approximately 4, which means that the average response speed is about 4 responses per second, which implies that the the average response time is 250 milliseconds. In other words, our new results are compatible with our previous estimates.\nThis example also makes something else clear: much like transformations of the predictors, transforming the response changes the hypothesis being tested. While it is relatively easy to re-formulate hypothesis about reaction time into hypotheses about speed, it can be harder to re-formulate other hypotheses. For example, a log transformation of the response changes the hypotheses on the original scale from additive effects to multiplicative effects. As a very simple example, consider two observations y1 = 100 and y2 = 1000. On the original scale, there y2 = 10 * y1. But on the \\(\\log_{10}\\) scale, log10(y2) = 1 + log10(y1). In other words: I recommend keeping interpretability of the model in mind before blindly chasing perfectly fulfilling all model assumptions.\nThere are two other little tricks that BoxCox.jl has to offer. First, the fitted transformation will work just like a function:\n\nbc(1000)\n\n0.9299202243766808\n\n\n\nbc.(response(slp))\n\n180-element Vector{Float64}:\n 0.9280071533109451\n 0.9281008004980732\n 0.9280202732761681\n 0.9285950348759796\n 0.9287948232963694\n 0.9290453817984637\n 0.9289143340819649\n 0.9283762261753863\n 0.9291020425612259\n 0.9292149261473703\n ⋮\n 0.9282383517103107\n 0.9284326403128913\n 0.9285246243376963\n 0.928352836088982\n 0.9286450699122083\n 0.9286737217944607\n 0.9287229751967703\n 0.9288548849797978\n 0.9288308689512543\n\n\nSecond, the decades since the publication of Box & Cox (1964) have seen many proposed extensions to handle that that may not be strictly positive. One such proposal from Yeo & Johnson (2000) is also implemented in BoxCox.jl. The definition of the transformation is:\n\\[\n\\begin{cases} ((y_+1)^\\lambda-1)/\\lambda & \\text{if }\\lambda \\neq 0, y \\geq 0 \\\\\n \\log(y_i + 1) & \\text{if }\\lambda = 0, y \\geq 0 \\\\\n -((-y_ + 1)^{(2-\\lambda)} - 1) / (2 - \\lambda) & \\text{if }\\lambda \\neq 2, y < 0 \\\\\n -\\log(-y_ + 1) & \\text{if }\\lambda = 2, y < 0\n\\end{cases}\n\\]\nand we can fit it in BoxCox.jl with\n\nyj = fit(YeoJohnsonTransformation, slp)\n\nYeo-Johnson transformation\n\nestimated λ: -1.0700\np-value: <1e-09\n\nresultant transformation:\n\nFor y ≥ 0,\n\n (y + 1)^-1.1 - 1\n------------------\n -1.1\n\n\nFor y < 0:\n\n -((-y + 1)^(2 - -1.1) - 1)\n----------------------------\n (2 - -1.1)\n\n\n\nf = boxcoxplot(yj; conf_level=0.95)\nf[0, :] = Label(f, \"Yeo-Johnson\"; tellwidth=false)\nf",
"crumbs": [
"Contrast coding and transformations",
"Transformations of the predictors and the response"
Expand Down
2 changes: 1 addition & 1 deletion sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
</url>
<url>
<loc>https://RePsychLing.github.io/SMLP2024/transformation.html</loc>
<lastmod>2024-09-10T21:34:41.685Z</lastmod>
<lastmod>2024-09-11T07:05:44.248Z</lastmod>
</url>
<url>
<loc>https://RePsychLing.github.io/SMLP2024/check_emotikon_transform.html</loc>
Expand Down
10 changes: 5 additions & 5 deletions transformation.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">

<meta name="author" content="Phillip Alday">
<meta name="dcterms.date" content="2024-09-10">
<meta name="dcterms.date" content="2024-09-11">

<title>Transformations of the predictors and the response – SMLP2024: Advanced Frequentist Track</title>
<style>
Expand Down Expand Up @@ -431,7 +431,7 @@ <h1 class="title">Transformations of the predictors and the response</h1>
<div>
<div class="quarto-title-meta-heading">Published</div>
<div class="quarto-title-meta-contents">
<p class="date">2024-09-10</p>
<p class="date">2024-09-11</p>
</div>
</div>

Expand Down Expand Up @@ -876,10 +876,10 @@ <h2 data-number="2" class="anchored" data-anchor-id="response"><span class="head
</div>
<p>Second, the decades since the publication of <span class="citation" data-cites="Box1964">Box &amp; Cox (<a href="#ref-Box1964" role="doc-biblioref">1964</a>)</span> have seen many proposed extensions to handle that that may not be strictly positive. One such proposal from <span class="citation" data-cites="YeoJohnson2000">Yeo &amp; Johnson (<a href="#ref-YeoJohnson2000" role="doc-biblioref">2000</a>)</span> is also implemented in BoxCox.jl. The definition of the transformation is:</p>
<p><span class="math display">\[
\begin{cases} ((x_i+1)^\lambda-1)/\lambda &amp; \text{if }\lambda \neq 0, y \geq 0 \\
\begin{cases} ((y_+1)^\lambda-1)/\lambda &amp; \text{if }\lambda \neq 0, y \geq 0 \\
\log(y_i + 1) &amp; \text{if }\lambda = 0, y \geq 0 \\
-((-x_i + 1)^{(2-\lambda)} - 1) / (2 - \lambda) &amp; \text{if }\lambda \neq 2, y &lt; 0 \\
-\log(-x_i + 1) &amp; \text{if }\lambda = 2, y &lt; 0
-((-y_ + 1)^{(2-\lambda)} - 1) / (2 - \lambda) &amp; \text{if }\lambda \neq 2, y &lt; 0 \\
-\log(-y_ + 1) &amp; \text{if }\lambda = 2, y &lt; 0
\end{cases}
\]</span></p>
<p>and we can fit it in BoxCox.jl with</p>
Expand Down

0 comments on commit 9c04da2

Please sign in to comment.