You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 27, 2024. It is now read-only.
Below are answers to a set of questions I received in the past from a valued colleague. To answer them I first described my mental model for thinking about this issues in a fairly general way.
Conceptual framework to think about experimental effects and individual differences
There are different uses of LMMs. If you are interested in individual differences in experimental effects (“random slopes”), you may want to relate them to “traditional” individual differences in IQ, SES, years of education. For ease of exposition, let’s assume we have A(2) x B(2) fixed-factor within-subject/within-item experiment with subjects and items as random factors and a covariate IQ from each subject and a covariate word frequency for each item. For ease of exposition, I also assume we don’t have to worry about quadratic or higher-order trends possibly associated with covariates; we may want to return to this later. For now we assume that a linear relationship with the dependent variable is adequate. So, assuming the usual reliability problems with continuous predictors (i.e., interactions between covariates require very large samples; McClelland & Judd, PsycholBull, 1993), a plausible complex, not maximal model could look like this:
This yields 11 fixed effects (A, B, iq, wf, A x B, A x iq, A x wf, B x iq, B x wf, A x B x iq, and A x B x wf). If A is replaced with a between-subject factor G (e.g., gender), the formula looks like this:
You still get the same number of fixed effects, but the random effect structure changes. Only effects associated with within-subject factors or within-subject covariates may appear as subject-related variance components (VCs) and correlation parameters (CPs); only effects associate with within-item factors or within-item covariates may appear as item-related VCs and CPs . Thus, (1) and (2) illustrate the fundamental fact you must reflect when setting up an LMM: Each fixed factor or covariate it is either between-subject or within-subject and either between-item or within-item; all four options are available for factors and for covariates. In 90% or the cases, these choices are dictated by the study design or the data. So it is not up to you, but you have to live with it. For sake of completeness: A variable is a within-subject factor/covariate if you have measures on more than one level of the variable (i.e., wf is within-subject; iq is between subject — unless you follow a subject longitudinally) and likewise for items (i.e., iq is within-item; wf is between item — unless you systematically manipulate the number of exposures in an experiment or examine word frequency across historical use). You don’t have to worry about counterbalancing issues (i.e., subjects do not see all items; items are not seen by all subjects).
With this conceptual framework in place, it is important to think about covariates such as IQ, SES, and education not in terms of correlations (forget it!), but in terms of effects as reflected in the slope. For many colleagues this is an important change in the mental model, but a slope is a difference score like an effect associated with a two-level factor — it measures the change in dv when you increase the covariate by 1 unit. So a large slope represents a large effect. Indeed, although you never should do this, conceptually it is best to think of your covariates as a two-level factor (high vs low wf; high vs. low iq). Once you make this switch in the mental model, the interpretation of factor x covariate interactions in the fixed effects is clear: it is a test of parallel lines (i.e., if significant, that is not parallel, the factor moderates the relation between covariate and dv). The interpretation of VCs should be clear, too: Are there reliable individual (item) differences in the experimental and quasi-experimental effects? And finally CPs inform you whether these effects correlate. Are people with a large frequency effect those with a large effect on your within-subject factors A, B, or A x B? Are the items that exhibit a large iq effect those that exhibit a large effect on your within-item factors A and B? (Just a reminder that in a 2 x 2 design with cells a, b, c, d (rowwise), effect of A = a+b-c-d, B = a-b+c-d, AxB = a-b-c+d. Thus, all three sources of variance map onto a simple difference between 2 cells vs 2 the two other cells.)
Within this framework, I can answer a few questions that ar frequently asked:
What about for ID measures like IQ, SES, years of education?
These variables usually play no role in the random-effect structure for Subj, because they are between-subject variables. In a psycholingustic experiment they could be relevant for the random-effect structure for Items. Obviously, the same item might be responded to differently by high and low IQ subjects. And the differences in the item-related IQ effect might correlate with the effect of a within-item manipulation A.
[Should] random subjects (intercept and especially slope) be included in … a model focusing on IDs?
Yes, but this applies only to within-subject covariates, not between-subject covariates.
[ Should] random slopes (and/or their interaction with random intercepts) ... be included in models when IDs are the focus?
Of course, if you have hypotheses of the kind a sketched above. However, depending on the power of your study the data may not support an LMM of this complexity. That’s why tests of hypotheses, in my opinion, can only meaningfully be carried out for identified (not degenerate or overparameterized) LMMs. Thus, this question requires that you adopt a strategy of model selection, for example, I favor parsimonious model selection. This could be the topic of a second email. Or just read the Bates et al. (2015) paper on Parsimonious Mixed Models. We are working on a new version of it. (It is funny we had two other publications out of the first draft of this paper, but the most relevant part about model selection is still in this one.) This issue is not settled (e.g., Barr et al., 2013). However, the unsettled part relates to the relevance for fixed effects. If you are interested in VCs and CPs, I would argue stongly that you have no choice but must engage in model selection. Moreover, my strongest argument in favor of model selection even in the absence of a theoretical interest in VCs and CPs is that in an overparameterized LMM you simply don’t see whether there are significant VCs and CPs. If there are significant VCs and CPs, then they may completely change the interpretation of your fixed effects. CPs represent Subj x effect or Item x effect interactions. And just like you usually cannot interpret a main effect of A in an unqualified way if it interacts with B, you cannot interpret a main effect of A if it interacts with Subj or Item.
What are the trade-offs, for example, between “shrinkage” versus better point estimates?
This question probably relates to differences between (1) the within-subject correlation (i.e., for each subject, we compute the effect of A and the effect of B and correlated these two effects), (2) the correlation parameter estimated by the model or (3) the visualization of the correlation parameter estimated by the model with the random effects for each subject. Let us assume Equation (1).
The advantage of (1) is that the pairs of differences are independent of each other; each subject contributes one difference for A and one difference for B. So you can use these scores in subsequent analyses (e.g., in mulltiple regression or SEMs); the disadvantage is that these dfference scores are not very reliable.
The advantage of (2) is that reduces the unreliability of the difference score. Consequently, CPs will be larger or even only obtained after you correct for unreliability (i.e. after shrinkage). So if you have a hypothesis that there is a correlation between the effect of A and how responsive items are to IQ, you may not see a within-item correlation and conclude there is no evidence for a correlation, but the correlation may be siignificant in a reliable CP. The disadvantage of (2) is that it is just a “latent” model parameter. You don’t “see” the data pairs for each item that conceptually underlie the CP. So there is no way to use this information in subsequent multiple regressions or SEMs.
The advantage of (3) is that it beautifully visualizes the benefit of shrinkage for revealing correlations between effects. Just look at the figures in the Frontiers paper. The disadvantage is that the difference pairs that are used in these plots (or the mean and a difference pair) are not independent of each other. These are conditional means for each subject (or item) based on the subject’s (item’s) data and the model parameters. The model parameters are estimated on the basis of all the data. Therefore, the predictions based on the model parameters induce a dependency. So strictly speaking you cannot use the values you get from ranef() in subsequent multiple regressions or SEMs.
Should the approach with regard to random subject effects be different when the interaction of an ID measure and an experimental measure is used to index IDs, as people often do in Psycholinguistics (e.g., interaction of an ID measure and a garden-path manipulation predicting reading time.)?
I am not quite sure whether I understood the question correctly, but my intuition is that the conceptual framework presented above should allow you to decide for yourself. If not, please rephrase.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Below are answers to a set of questions I received in the past from a valued colleague. To answer them I first described my mental model for thinking about this issues in a fairly general way.
Conceptual framework to think about experimental effects and individual differences
There are different uses of LMMs. If you are interested in individual differences in experimental effects (“random slopes”), you may want to relate them to “traditional” individual differences in IQ, SES, years of education. For ease of exposition, let’s assume we have A(2) x B(2) fixed-factor within-subject/within-item experiment with subjects and items as random factors and a covariate IQ from each subject and a covariate word frequency for each item. For ease of exposition, I also assume we don’t have to worry about quadratic or higher-order trends possibly associated with covariates; we may want to return to this later. For now we assume that a linear relationship with the dependent variable is adequate. So, assuming the usual reliability problems with continuous predictors (i.e., interactions between covariates require very large samples; McClelland & Judd, PsycholBull, 1993), a plausible complex, not maximal model could look like this:
(1)
dv ~ 1 + A*B*(iq + wf) + (1 + A*B*wf | Subj) + (1 + A*B*iq | Item)
This yields 11 fixed effects (
A
,B
,iq
,wf
,A x B
,A x iq
,A x wf
,B x iq
,B x wf
,A x B x iq
, andA x B x wf
). IfA
is replaced with a between-subject factorG
(e.g., gender), the formula looks like this:(2)
dv ~ 1 + G*B*(iq + wf) + (1 + B*wf | Subj) + (1 + G*B*iq | Item)
You still get the same number of fixed effects, but the random effect structure changes. Only effects associated with within-subject factors or within-subject covariates may appear as subject-related variance components (VCs) and correlation parameters (CPs); only effects associate with within-item factors or within-item covariates may appear as item-related VCs and CPs . Thus, (1) and (2) illustrate the fundamental fact you must reflect when setting up an LMM: Each fixed factor or covariate it is either between-subject or within-subject and either between-item or within-item; all four options are available for factors and for covariates. In 90% or the cases, these choices are dictated by the study design or the data. So it is not up to you, but you have to live with it. For sake of completeness: A variable is a within-subject factor/covariate if you have measures on more than one level of the variable (i.e.,
wf
is within-subject;iq
is between subject — unless you follow a subject longitudinally) and likewise for items (i.e.,iq
is within-item;wf
is between item — unless you systematically manipulate the number of exposures in an experiment or examine word frequency across historical use). You don’t have to worry about counterbalancing issues (i.e., subjects do not see all items; items are not seen by all subjects).With this conceptual framework in place, it is important to think about covariates such as IQ, SES, and education not in terms of correlations (forget it!), but in terms of effects as reflected in the slope. For many colleagues this is an important change in the mental model, but a slope is a difference score like an effect associated with a two-level factor — it measures the change in
dv
when you increase the covariate by 1 unit. So a large slope represents a large effect. Indeed, although you never should do this, conceptually it is best to think of your covariates as a two-level factor (high vs lowwf
; high vs. lowiq
). Once you make this switch in the mental model, the interpretation of factor x covariate interactions in the fixed effects is clear: it is a test of parallel lines (i.e., if significant, that is not parallel, the factor moderates the relation between covariate anddv
). The interpretation of VCs should be clear, too: Are there reliable individual (item) differences in the experimental and quasi-experimental effects? And finally CPs inform you whether these effects correlate. Are people with a large frequency effect those with a large effect on your within-subject factorsA
,B
, orA x B
? Are the items that exhibit a largeiq
effect those that exhibit a large effect on your within-item factorsA
andB
? (Just a reminder that in a 2 x 2 design with cells a, b, c, d (rowwise), effect ofA
= a+b-c-d,B
= a-b+c-d,AxB
= a-b-c+d. Thus, all three sources of variance map onto a simple difference between 2 cells vs 2 the two other cells.)Within this framework, I can answer a few questions that ar frequently asked:
What about for ID measures like IQ, SES, years of education?
These variables usually play no role in the random-effect structure for
Subj
, because they are between-subject variables. In a psycholingustic experiment they could be relevant for the random-effect structure for Items. Obviously, the same item might be responded to differently by high and low IQ subjects. And the differences in the item-related IQ effect might correlate with the effect of a within-item manipulationA
.[Should] random subjects (intercept and especially slope) be included in … a model focusing on IDs?
Yes, but this applies only to within-subject covariates, not between-subject covariates.
[ Should] random slopes (and/or their interaction with random intercepts) ... be included in models when IDs are the focus?
Of course, if you have hypotheses of the kind a sketched above. However, depending on the power of your study the data may not support an LMM of this complexity. That’s why tests of hypotheses, in my opinion, can only meaningfully be carried out for identified (not degenerate or overparameterized) LMMs. Thus, this question requires that you adopt a strategy of model selection, for example, I favor parsimonious model selection. This could be the topic of a second email. Or just read the Bates et al. (2015) paper on Parsimonious Mixed Models. We are working on a new version of it. (It is funny we had two other publications out of the first draft of this paper, but the most relevant part about model selection is still in this one.) This issue is not settled (e.g., Barr et al., 2013). However, the unsettled part relates to the relevance for fixed effects. If you are interested in VCs and CPs, I would argue stongly that you have no choice but must engage in model selection. Moreover, my strongest argument in favor of model selection even in the absence of a theoretical interest in VCs and CPs is that in an overparameterized LMM you simply don’t see whether there are significant VCs and CPs. If there are significant VCs and CPs, then they may completely change the interpretation of your fixed effects. CPs represent
Subj
x effect orItem
x effect interactions. And just like you usually cannot interpret a main effect ofA
in an unqualified way if it interacts withB
, you cannot interpret a main effect ofA
if it interacts withSubj
orItem
.What are the trade-offs, for example, between “shrinkage” versus better point estimates?
This question probably relates to differences between (1) the within-subject correlation (i.e., for each subject, we compute the effect of A and the effect of B and correlated these two effects), (2) the correlation parameter estimated by the model or (3) the visualization of the correlation parameter estimated by the model with the random effects for each subject. Let us assume Equation (1).
The advantage of (1) is that the pairs of differences are independent of each other; each subject contributes one difference for A and one difference for B. So you can use these scores in subsequent analyses (e.g., in mulltiple regression or SEMs); the disadvantage is that these dfference scores are not very reliable.
The advantage of (2) is that reduces the unreliability of the difference score. Consequently, CPs will be larger or even only obtained after you correct for unreliability (i.e. after shrinkage). So if you have a hypothesis that there is a correlation between the effect of A and how responsive items are to IQ, you may not see a within-item correlation and conclude there is no evidence for a correlation, but the correlation may be siignificant in a reliable CP. The disadvantage of (2) is that it is just a “latent” model parameter. You don’t “see” the data pairs for each item that conceptually underlie the CP. So there is no way to use this information in subsequent multiple regressions or SEMs.
The advantage of (3) is that it beautifully visualizes the benefit of shrinkage for revealing correlations between effects. Just look at the figures in the Frontiers paper. The disadvantage is that the difference pairs that are used in these plots (or the mean and a difference pair) are not independent of each other. These are conditional means for each subject (or item) based on the subject’s (item’s) data and the model parameters. The model parameters are estimated on the basis of all the data. Therefore, the predictions based on the model parameters induce a dependency. So strictly speaking you cannot use the values you get from ranef() in subsequent multiple regressions or SEMs.
Should the approach with regard to random subject effects be different when the interaction of an ID measure and an experimental measure is used to index IDs, as people often do in Psycholinguistics (e.g., interaction of an ID measure and a garden-path manipulation predicting reading time.)?
I am not quite sure whether I understood the question correctly, but my intuition is that the conceptual framework presented above should allow you to decide for yourself. If not, please rephrase.
Beta Was this translation helpful? Give feedback.
All reactions