[R] Effect size of comparison of two levels of a factor in multiple linear regression
Christoph Mathys
cmathys at bidmc.harvard.edu
Sun Feb 3 16:09:50 CET 2008
Dear R users,
I have a linear model of the kind
outcome ~ treatment + covariate
where 'treatment' is a factor with three levels ("0", "1", and "2"),
and the covariate is continuous. Treatments "1" and "2" both have
regression coefficients significantly different from 0 when using
treatment contrasts with treatment "0" as the baseline. I would now like
to determine effect sizes (akin to Cohen's d in a two-sample comparison)
for the comparison to baseline of treatments "1" and "2". I have
illustrated a way to do this in the reproducible example below and am
grateful for any comments on the soundness of what I'm doing. I have not
yet found a way to determine confidence intervals for the effect sizes
derived below and would appreciate tips on that.
set.seed(123456) # Make session reproducible
# Set up the treatment factor with three levels and 100 observations
# each
treatment <- factor(c(rep(0, 100), rep(1, 100), rep(2, 100)))
# Simulate outcomes
outcome <- rep(NA, 300)
outcome[treatment==0] <- rnorm(100, 10, 5) # baseline: mean=10, sd=5
outcome[treatment==1] <- rnorm(100, 30, 5) # effect size 4
outcome[treatment==2] <- rnorm(100, 40, 5) # effect size 6
# Check effect sizes (Cohen's d)
cohens.d <- function (x, y) {(mean(x)-mean(y))/sqrt((var(x)+var(y))/2) }
cohens.d(outcome[treatment==1], outcome[treatment==0])
[1] 3.984774
cohens.d(outcome[treatment==2], outcome[treatment==0])
[1] 6.167798
# Sometimes standardized regression coefficients are recommended
# for determining effect size but that clearly doesn't work here:
coef(lm(scale(outcome) ~ treatment))
(Intercept) treatment1 treatment2
-1.233366 1.453152 2.246946
# The reason it doesn't work is that the difference of outcome
# means is divided by the sd of *all* outcomes:
(mean(outcome[treatment==1])-mean(outcome[treatment==0]))/sd(outcome)
[1] 1.453152
(mean(outcome[treatment==2])-mean(outcome[treatment==0]))/sd(outcome)
[1] 2.246946
# Now, create a situation where Cohen's d is impossible to
# calculate directly by introducing a continuous covariate:
covariate <- 1:300
outcome <- outcome + rnorm(300, covariate, 2)
model <- lm(scale(outcome) ~ treatment + scale(covariate))
coef(model)
(Intercept) treatment1 treatment2 scale(covariate)
-0.1720456 0.1994251 0.3167116 0.8753761
# Proposed way to determine effect size: simulate outcomes for each
# treatment level assuming the covariate to have a fixed value (here
# its mean value after standardization: zero)
library(MCMCpack)
no.of.sims <- 10000
sims.model <- MCMCregress(model, mcmc=no.of.sims)
sims.model[1:2,]
(Intercept) treatment1 treatment2 scale(covariate) sigma2
[1,] -0.1780735 0.2024111 0.3395233 0.8682119 0.002617449
[2,] -0.1521623 0.1773623 0.2956053 0.8764573 0.003529013
sims.treat0 <- rnorm(no.of.sims, sims.model[,"(Intercept)"], sqrt(sims.model[,"sigma2"]))
sims.treat1 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] + sims.model[,"treatment1"], sqrt(sims.model[,"sigma2"]))
sims.treat2 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] + sims.model[,"treatment2"], sqrt(sims.model[,"sigma2"]))
# Calculate Cohen's d for simulated values
cohens.d(sims.treat1, sims.treat0)
[1] 3.683093
cohens.d(sims.treat2, sims.treat0)
[1] 5.782622
These values are reasonably close to the ones (4 and 6) I plugged in at
the beginning. It would be even nicer to have a confidence interval for
them, but if I bootstrap one out of the simulated outcomes its width
depends on the number of simulations and is therefore arbitrary. If
anyone knew a better way to get at the effect sizes I'm looking for or
how I could also get confidence intervals for them, that would be
greatly appreciated.
Thanks,
Christoph
--
Christoph Mathys, M.S.
Music and Neuroimaging Laboratory
Beth Israel Deaconess Medical Center
and Harvard Medical School
More information about the R-help
mailing list