[R-sig-ME] discrepency paired test and lmer()
E.W.Tobi at lumc.nl
E.W.Tobi at lumc.nl
Mon Aug 18 11:20:18 CEST 2014
Dear All, as my previous question was posted at the height of the vacation period I would like to repost the following issue I am encountering:
I am trying to test a continues, normally distributed, variable (Y) within sibling pairs which are discordant, or concordant, for an exposure, also including singleton exposed or unexposed down the line.
The inclusion of singletons excludes a simple lm() model like:
lm(Y~exposure+family_id)
To account for the family relationship
I tried starting simple, by making the model comparable to the simple lm model when including only the discordant and concordant pairs... leaving the singletons out the mix for now... but this has already proven surprisingly hard.
If I test the discordant sibling pairs only, the following models yields the exact same outcome as a paired t-test and lm(Y~exposure+fam_id), as it should.
lmer(Y~exposure+(1|family_id))
However, adding exposure concordant pairs to the discordant pairs set (non-exposed & non-exposed : "0"-"0" pairs....) yields completely different results between lmer and lm.
Overall, the t-values are (much) higher than with lm(). I am tasting many Y's.. (my field is genomics), so this inflation is very apparent when the plotting the expected versus observed t-statistics. Typically the 'lambda' (the coefficient describing the relation between observed en expected statistic) is 1.2-1.6.
Does anybody have an idea on why lm() and lmer() differ in their t-value estimates when exposure concordant pairs are added?
And how to make a model resembling the more simple paired test?
To make things more complicated, we also have strong evidence that not all exposed siblings within a discordant pair will respond the same amounts of Y, but that the response is variable.
Therefore I added exposure status also as random slope, to keep things 'maximal'.
A standardized example:
library(lme4.0) # I am using the old stable version, but the inflation is also apparent in the new version.
set.seed(20289457)
# normally distributed variable, with added 'effect' to part of the individuals (exposed within certain sibling pairs)
Y <- rnorm(n=200, m=0.25, sd=0.06)+c(rnorm(n=50,m=0.05,sd=0.03),rep(0,150))
family_id <- as.factor(rep(seq(1,100,by=1),2))
exposure <- c(rep(1,50),rep(0,150))
siblingtype <- c(rep(1,50),rep(0,50),rep(1,50),rep(0,50)) # this variable is added to distinguish discordant exposure from concordant exposure pairs
dataset <- data.frame(Y,family_id,exposure, siblingtype)
# first we take a look at the outcome in the exposure discordant pairs.
coefficients(summary(lm(Y~exposure+family_id, data=dataset, subset= siblingtype==1)))[c(1,2),]
t.test(Y[1:50],Y[101:150],paired=T)
lmer(Y~exposure+(1|family_id), data=dataset, subset= siblingtype==1)
lmer(Y~exposure+(1+exposure|family_id), data=dataset, subset= siblingtype==1)
# all have identical t-values; paired tests are identical to the mixed models
# the unexposed-unexposed pairs have:
t.test(Y[51:100],Y[151:200],paired=T)
# no evidence for a difference, which should boost confidence on observed difference?
coefficients(summary(lm(Y~exposure+family_id, data=dataset)))[c(1,2),]
# indeed, in a simple lm() the t-value increases.
lmer(Y~exposure+(1|family_id), data=dataset)
# also in a lmer, but now with much greater t-value: of note after testing 100K's of Y's I can definitely say that this is an inflated statistic!
lmer(Y~exposure+(1+exposure|family_id), data=dataset)
# adding the random intercept somewhat downplays this increase. But the outcome is not identical. The inflation remains!
# off note, in my real data the Corr of the random effects is ~0.2-0.3, close to the 'theoretical' 0.25 for full siblings in genetic studies (however Y is not a genetic marker and may thus deviate somewhat from 0.25)
sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] coxme_2.2-3 bdsmatrix_1.3-1 survival_2.37-7 gee_4.13-18 nlme_3.1-117 lme4.0_0.999999-4 lattice_0.20-29
[8] Matrix_1.1-4 plyr_1.8.1
loaded via a namespace (and not attached):
[1] grid_3.0.2 Rcpp_0.11.2 stats4_3.0.2 tools_3.0.2
Any insight is highly appreciated.
Best wishes,
Tobi
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list