[R-sig-ME] Same variable as both fixed and random
Joshua Wiley
jwiley.psych at gmail.com
Thu Jul 17 13:42:33 CEST 2014
Dear Thierry,
I completely agree with you. There is a distinction between treating
year continuously and as a factor.
That is why I made the point about DegreeDays. If DegreeDays were not
in the model, one might hypothesize that year would pick up some time
trends in climate change, but if climate change is itself directly
measured and in the model, than it would seem year is some general
proxy for any other effects that vary across years, and in that sense,
I do not know that imposing a linear relationship is sensible, leaving
the choice of treating it as a factor or as a random effect.
Thanks for the examples, they are an excellent way to show the point.
Sincerely,
Josh
On Thu, Jul 17, 2014 at 8:39 PM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
> Dear Joshua,
>
> I agree when the variable is a factor, because then the model with both fixed and random effect becomes unidentifiable. But I disagree when the variable is used as a continuous variable in the fixed effects and makes sense as a factor as well. In that case (provided the variable has enough levels) it makes sense to add the variable as a continuous fixed effect and a random intercept. I'm made a toy example below. The first model has a linear trend along year with additional year-to-year variation. The second one assumes a quadratic effect that is modeled as a linear effect. In the third case the fixed effect uses a quadratic model. The random effects of the first two models capture the discrepancy between the model and the true trend. The random effects in the third model have zero variance because they are not relevant.
>
> Best regards,
>
> Thierry
>
> library(lme4)
> library(ggplot2)
> set.seed(123546)
> n.year <- 10
> n.replicate <- 10
> intercept <- 1
> trend <- 2
> quadratic <- -0.5
> sd.noise <- 1
> sd.year <- 2
> test <- expand.grid(year = seq_len(n.year), replicate = seq_len(n.replicate))
>
> test$y.true <- intercept + trend * test$year + rnorm(n.year, sd = sd.year)
> test$y <- test$y.true + rnorm(nrow(test), sd = sd.noise)
>
> model <- lmer(y ~ year + (1|year), data = test)
> test$fit <- fitted(model)
> test$global <- predict(model, re.form = ~0)
> ggplot(test, aes(x = year, y = y.true)) + geom_line() + geom_point(aes(y = y)) + geom_line(aes(y = fit), colour = "red") + geom_line(aes(y = global), colour = "blue") + geom_abline(intercept = intercept, slope = trend, colour = "magenta")
> plot(ranef(model)$year[, 1])
>
> test$y.true <- intercept + trend * test$year + quadratic * test$year ^ 2
> test$y <- test$y.true + rnorm(nrow(test), sd = sd.noise)
> model <- lmer(y ~ year + (1|year), data = test)
> test$fit <- fitted(model)
> test$global <- predict(model, re.form = ~0)
> ggplot(test, aes(x = year, y = y.true)) + geom_line() + geom_point(aes(y = y)) + geom_line(aes(y = fit), colour = "red") + geom_line(aes(y = global), colour = "blue") + stat_function(fun = function(x){intercept + trend * x + quadratic * x ^ 2}, colour = "magenta", geom = "line")
>
> plot(ranef(model)$year[, 1])
>
> model <- lmer(y ~ poly(year, 2) + (1|year), data = test)
> test$fit <- fitted(model)
> test$global <- predict(model, re.form = ~0)
> ggplot(test, aes(x = year, y = y.true)) + geom_line() + geom_point(aes(y = y)) + geom_line(aes(y = fit), colour = "red") + geom_line(aes(y = global), colour = "blue") + stat_function(fun = function(x){intercept + trend * x + quadratic * x ^ 2}, colour = "magenta", geom = "line")
> plot(ranef(model)$year[, 1])
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Joshua Wiley
> Verzonden: donderdag 17 juli 2014 11:24
> Aan: Stephen Mayor
> CC: r-sig-mixed-models at r-project.org
> Onderwerp: Re: [R-sig-ME] Same variable as both fixed and random
>
> Hi Stephen,
>
> In your example, I would recommend not including year as both a fixed and random effect (note that Arrival ~DegreeDays + Year + (1 + Year |
> State) --- i.e., allowing the effects of year to differ by state or some such, would be a different scenario).
>
> You will partial the most variability out of the estimates by specifying Year as a fixed effect, however, if it is being treated categorically, this will result in quite a few extra parameters, and you will also not get an estimate of the overall variability in intercept by year. If those specific ten years are not of interest, and you are controlling for the "key" feature that you expect to change, namely, DegreeDays, then I would suggest:
>
> lmer(Arrival ~ DegreeDays + Longitude + Latitude + (1 | State) + (1 | Year))
>
> A separate issue is how longitude and latitude are included (e.g., depending on the precision of your data, it may be helpful to allow a stronger similarity between nearby locations, although it may not matter much if you only have data at the level of State).
>
> Cheers,
>
> Josh
>
>
> On Wed, Jul 16, 2014 at 2:47 AM, Stephen Mayor <smayor at neoninc.org> wrote:
>> Hello,
>> When should a variable be specified as both a fixed effect AND as a random effect? If a single variable is defined as both fixed and random, how does one interpret the coefficients? Is it 'sloppy' practice to include it as both? Should one be cautious specifying a variable twice, or is it actually more conservative to do so?
>>
>> I am interested in your thoughts in general, but here is a simplified example if helpful:
>> I am interested in the date of arrival of house wren, a migratory bird, to each state in the US as a response to year, degree days (climate), latitude, longitude, state. I primarily want to test if there is a temporal trend in earlier arrival in more recent years, as a result of recent climatic warming.
>>
>> I specified the model as follows, treating state as a random group. Because I am interested in testing a linear trend across years, I specified Year as a fixed (and continuous) effect. But I am unsure as to whether I should ALSO specify it as a random factor, because I am interested in making inferences beyond the 10 years of data that I have and I don't have any specific interest in these 10 years over any other period.
>>
>> lmer(Arrival ~ DegreeDays + Year + Longitude + Latitude + (1|State) )
>> OR lmer(Arrival ~ DegreeDays + Year + Longitude + Latitude + (1|State)
>> + (1|Year))
>>
>> Thanks in advance for any help or suggestions, Stephen.
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>
> --
> Joshua F. Wiley
> Ph.D. Student, UCLA Department of Psychology
> http://joshuawiley.com/
> Senior Analyst, Elkhart Group Ltd.
> http://elkhartgroup.com
> Office: 260.673.5518
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
> The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
--
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518
More information about the R-sig-mixed-models
mailing list