Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move new example to datasets #15

Merged
merged 3 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 19 additions & 38 deletions contrasts_kwdyz11.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,17 @@ fig-format: png

```{julia}
#| code-fold: true
using AlgebraOfGraphics
using CairoMakie
using Chain
using DataFrames
using MixedModels
using RCall
using SMLP2024: dataset
using StatsBase
using StatsModels

CairoMakie.activate!(; type="png")

import ProgressMeter
progress = false
```

Expand All @@ -30,13 +29,13 @@ Many researchers have pointed out that contrasts should be "tested instead of, r

For a (quasi-)experimental set of data, there is (or should be) a clear _a priori_ theoretical commitment to specific hypotheses about differences between factor levels and how these differences enter in interactions with other factors. This specification should be used in the first LMM and reported, irrespective of the outcome. If alternative theories lead to alternative _a priori_ contrast specifications, both analyses are justified. If the observed means render the specification completely irrelevant, the comparisons originally planned could still be reported in a Supplement).

In this script, we are working through a large number of different contrasts for the same data. The purpose is to introduce both the preprogrammed (canned) and the general options to specify hypotheses about main effects and interactions. Obviously, we do not endorse generating a plot of the means and specifying the contrasts accordingly. This is known as the [Texas sharpshooter](https://www.bayesianspectacles.org/origin-of-the-texas-sharpshooter/) fallacy. The link leads to an illustration and brief historical account by Wagenmakers (2018).
In this script, we are working through a large number of different contrasts for the same data. The purpose is to introduce both the preprogrammed ("canned") and the general options to specify hypotheses about main effects and interactions. Obviously, we do not endorse generating a plot of the means and specifying the contrasts accordingly. This is known as the [Texas sharpshooter](https://www.bayesianspectacles.org/origin-of-the-texas-sharpshooter/) fallacy. The link leads to an illustration and brief historical account by Wagenmakers (2018).

Irrespective of how results turn out, there is nothing wrong with specifying a set of post-hoc contrasts to gain a better understanding of what the data are trying to tell us. Of course, in an article or report about the study, the _a priori_ and post-hoc nature of contrast specifications must be made clear. Some kind of alpha-level adjustment (e.g., Bonferroni) may be called for, too. And, of course, there are grey zones.

There is quite a bit of statistical literature on contrasts. Two local references are @Brehm2022 and @Schad2020.
There is quite a bit of statistical literature on contrasts. Two "local" references are @Brehm2022 and @Schad2020.

For further readings see Further Readings in @Schad2020.
For further readings see "Further Readings" in @Schad2020.

# Example data {#sec-data}

Expand Down Expand Up @@ -439,10 +438,7 @@ Three factors:
2 x 3 = 6 measures / subject

```{julia}
R"""
dat2 = readRDS("data/Exp_2x2x3.rds");
"""
@rget(dat2)
dat2 = dataset(:exp_2x2x3)
```

We select an LMM supported by the data.
Expand Down Expand Up @@ -482,35 +478,20 @@ A: A2 & B: B3 0.523243 0.950202 0.55 0.5819
───────────────────────────────────────────────────────────────────
```

The following figure also appears only in interactive chunk execution. The chunk generates also an error when rendered.

```{r}
#| eval: true

library(tidyverse)
dat2 = readRDS("data/Exp_2x2x3.rds");

cbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")

tbl1 <-
dat2 |>
group_by(Subj, A, B) |>
reframe(N=n(),dv=mean(dv)) |>
group_by(A, B) |>
reframe(N=n(), dv_M=mean(dv), dv_SD=sd(dv), dv_SE=dv_SD/sqrt(N))
tbl1

fig1 <-
tbl1 |>
ggplot(aes(y=dv_M, x=B, group=A, color=A)) +
geom_point() +
geom_line() +
scale_color_manual(values=cbPalette[2:3]) +
theme_bw()

print(fig1)
NULL
```{julia}
using Chain
tbl1 = @chain DataFrame(dat2) begin
groupby(_, [:Subj, :A, :B])
combine(_, nrow => :n, :dv => mean => :dv)
groupby(_, [:A, :B])
combine(_,
:dv => mean => :dv_M,
:dv => std => :dv_SD,
:dv => sem => :dv_SE)
end

fig1 = data(tbl1) * mapping(:B, :dv_M; color=:A) * (visual(Lines) + visual(Scatter))
draw(fig1)
```


Expand Down
3 changes: 0 additions & 3 deletions data/Exp_2x2x3.rds

This file was deleted.

22 changes: 17 additions & 5 deletions src/datasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ _file(x) = joinpath(CACHE[], string(x, ".arrow"))

clear_scratchspaces!() = Scratch.clear_scratchspaces!(@__MODULE__)

const datasets =
const DATASETS =
CSV.read(
IOBuffer(
"""
Expand All @@ -21,6 +21,7 @@ fggk21_Child,c2fmn,1,61c91e00336e6f804e9f6b86986ebb4a14561cc4908b3a21cb27c113d2b
fggk21_Score,7fqx3,1,99d73ee705aaf5f4ee696eadbba992d0113ba6f467ce337a62a63853e4617400
kkl15,p8cea,2,90d7bb137c8613d7a15c8597c461aee7c7cb0f0989a07c80fc93e1fbe2e5c156
kwdyz11,4cv52,3,2fa23aa8aa25e1adb10183c8d29646ae0d19d6baef9d711c9906f7fa1b225571
exp_2x2x3,za9gs,1,cb09684b7373492e849c83f20a071b97f986123677134ac2ddb9ec0dcb32e503
"""
),
Table;
Expand All @@ -29,25 +30,34 @@ kwdyz11,4cv52,3,2fa23aa8aa25e1adb10183c8d29646ae0d19d6baef9d711c9906f7fa1b225571
)

if @isdefined(_cacheddatasets)
empty!(_cacheddatasets) # start from an empty cache in case datasets has changed
empty!(_cacheddatasets) # start from an empty cache in case DATASETS has changed
else
const _cacheddatasets = Dict{Symbol, Arrow.Table}()
end

"""
datasets()

Return a vector of the names of datasets available for use in [`dataset`](@ref).
"""
function datasets()
return sort!(vcat(SMLP2024.DATASETS.dsname, MixedModelsDatasets.datasets()))
end

"""
dataset(name::Union(Symbol, AbstractString))

Return as an `Arrow.Table` the dataset named `name`.

Available dataset names, their versions, the filenames on the osf.io site and an SHA2 checksum of their contents
are in the table `datasets`.
are in the table `DATASETS`.

The files are cached in the scratchspace for this package. The name of this directory is the value of `CACHE[]`.
"""
function dataset(nm::AbstractString)
return get!(_cacheddatasets, Symbol(nm)) do # retrieve from cache if available, otherwise
# check for nm in datasets table first so MMDS can be overridden
rows = filter(==(nm) ∘ getproperty(:dsname), datasets)
# check for nm in DATASETS table first so MMDS can be overridden
rows = filter(==(nm) ∘ getproperty(:dsname), DATASETS)
if isempty(rows)
nm in MMDS || error("Dataset '$nm' is not available")
MixedModelsDatasets.dataset(nm)
Expand All @@ -58,10 +68,12 @@ function dataset(nm::AbstractString)
if ismissing(row.filename)
load_quiver() # special-case `ratings` and `movies`
else
@info "Downloading dataset..."
Downloads.download(
string("https://osf.io/", row.filename, "/download?version=", row.version),
fnm,
)
@info "done"
end
end
if row.sha2 ≠ bytes2hex(open(sha2_256, fnm))
Expand Down
Loading