[R] Understanding the intercept value in a multiple linear regression with categorical values

Joao Azevedo joao.c.azevedo at gmail.com
Mon Jul 30 01:57:52 CEST 2012


Hi!

You're right. I was misinterpreting the way the coefficients were
calculated. Reading about the method of least squares helped me in
clarifying some of my doubts.

Thanks for your tips!

--
Joao.


On Fri, Jul 27, 2012 at 2:36 PM, Jean V Adams <jvadams at usgs.gov> wrote:
> Joao,
>
> Your intuition is correct, the intercept represents the predicted value for
> wool A and tension L.  But, you're tripping up on how to figure out that
> predicted value.  In the model that you fit, the predicted value for wool A
> and tension L is not simply the mean of the observations for wool A and
> tension L, because there are only main effects in the model (no
> interaction).  Try this:
>
> attach(warpbreaks)
> tapply(breaks, list(wool, tension), mean)
>
> fit <- lm(breaks ~ wool + tension, data=warpbreaks)
> tapply(fit$fitted, list(wool, tension), mean)
>
> fit2 <- lm(breaks ~ wool*tension, data=warpbreaks)
> tapply(fit2$fitted, list(wool, tension), mean)
>
> I believe that your results will depend on how the "contrasts" option is set
> in R.  For me it's like this:
> options("contrasts")
>
> $contrasts
>         unordered           ordered
> "contr.treatment"      "contr.poly"
>
> Jean
>
>
> Joao Azevedo <joao.c.azevedo at gmail.com> wrote on 07/27/2012 07:16:10 AM:
>
>>
>> Hi!
>>
>> Thanks for the link. I've already stumbled upon that explanation. I'm
>> able to understand how the coding schemes are applied in the supplied
>> examples, but they only use a single explanatory variable. My problem
>> is with understanding the model when there are multiple categorical
>> explanatory variables.
>>
>> --
>> Joao.
>>
>> On Fri, Jul 27, 2012 at 1:04 PM, Jean V Adams <jvadams at usgs.gov> wrote:
>> > Joao,
>> >
>> > There's a very thorough explanation at
>> >
> http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
>
>> >
>> > Jean
>> >
>> >
>> > Joao Azevedo <joao.c.azevedo at gmail.com> wrote on 07/27/2012 06:32:31 AM:
>> >
>> >>
>> >> Hi!
>> >>
>> >> I'm failing to understand the value of the intercept value in a
>> >> multiple linear regression with categorical values. Taking the
>> >> "warpbreaks" data set as an example, when I do:
>> >>
>> >> > lm(breaks ~ wool, data=warpbreaks)
>> >>
>> >> Call:
>> >> lm(formula = breaks ~ wool, data = warpbreaks)
>> >>
>> >> Coefficients:
>> >> (Intercept)        woolB
>> >>      31.037       -5.778
>> >>
>> >> I'm able to understand that the value of intercept is the mean value
>> >> of breaks when wool equals "A", and that adding up the "woolB"
>> >> coefficient to the intercept value I get the mean value of breaks when
>> >> wool equals "B". However, if I also consider the tension variable in
>> >> the model, I'm unable to figure out the meaning of the intercept
>> >> value:
>> >>
>> >> > lm(breaks ~ wool + tension, data=warpbreaks)
>> >>
>> >> Call:
>> >> lm(formula = breaks ~ wool + tension, data = warpbreaks)
>> >>
>> >> Coefficients:
>> >> (Intercept)        woolB     tensionM     tensionH
>> >>      39.278       -5.778      -10.000      -14.722
>> >>
>> >> I thought it would be the mean value of breaks when either wool equals
>> >> "A" or tension equals "L", but that isn't true for this dataset.
>> >>
>> >> Any clues on interpreting the value of intercept?
>> >>
>> >> Thanks!
>> >>
>> >> --
>> >> Joao.



More information about the R-help mailing list