[R] Correct interpretation of a regression coefficient
Peter Dalgaard
pd@|gd @end|ng |rom gm@||@com
Tue Mar 10 13:21:05 CET 2026
That's actually a shift and not a scaling, no? Just multiplying with a scalar is not going to change the result:
> example$x3 <- example$x1*10
> example$x4 <- example$x2*10
> summary(lm(y ~ x1 + x1:x2, data = example))
Call:
lm(formula = y ~ x1 + x1:x2, data = example)
Residuals:
Min 1Q Median 3Q Max
-13.3499 -2.8748 0.6154 3.2637 11.4331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2437 0.4548 -0.536 0.593352
x1 1.9654 0.4932 3.985 0.000131 ***
x1:x2 1.5827 0.5249 3.015 0.003275 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.533 on 97 degrees of freedom
Multiple R-squared: 0.2232, Adjusted R-squared: 0.2072
F-statistic: 13.94 on 2 and 97 DF, p-value: 4.784e-06
> summary(lm(y ~ x3 + x3:x4, data = example))
Call:
lm(formula = y ~ x3 + x3:x4, data = example)
Residuals:
Min 1Q Median 3Q Max
-13.3499 -2.8748 0.6154 3.2637 11.4331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.243674 0.454821 -0.536 0.593352
x3 0.196545 0.049321 3.985 0.000131 ***
x3:x4 0.015827 0.005249 3.015 0.003275 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.533 on 97 degrees of freedom
Multiple R-squared: 0.2232, Adjusted R-squared: 0.2072
F-statistic: 13.94 on 2 and 97 DF, p-value: 4.784e-06
------
An intuitive explanation for your example is that ~ x1 + x1:x2 is bilinear, but if x1==0 there is no effect of x2.
With ~ x3 + x3:x4 it is still bilinear and no effect of x4 is equivalent to no effect of x2, but now that happens when x3==0, which is when x1==1. Same thing happens if you force a regression through the origin and move the origin.
---
I tried to come up with an example showing the corresponding effect with character variables in different languages and got as far as this
dd <- expand.grid(A=1:2, B=1:2, rep=1:5)
dd <- within(dd, {en_A <- c("F","M")[A]; en_B <- c("N","Y")[B]})
dd <- within(dd, {da_A <- c("K","M")[A]; da_B <- c("N","J")[B]})
dd$Y <- matrix(c(1,1,1,2),2)[cbind(dd$A,dd$B)] + rnorm(20, sd=.2)
summary(lm(Y~en_A * en_B, dd))
summary(lm(Y~da_A * da_B, dd))
With this, you'll find that the main effect of being male depends on whether we speak English or Danish.
Removing a main effect should then give similar effects to your example, but R's factor coding conventions get in the way because it codes the interaction term with the full indicator parametrization of the other term. I suppose you could force it either by directly modifying the design matrix or by using explicit 0-1 codings.
-pd
> On 9 Mar 2026, at 22.54, Andrew Robinson <apro using unimelb.edu.au> wrote:
>
> Hi Peter,
> hopefully this clarifies.
>
> ## Here's a light-touch example
> set.seed(8675309)
> example <- data.frame(x1 = rnorm(100),
> x2 = rnorm(100))
> example$x3 <- example$x1 - 1
> example$x4 <- example$x2 - 1
> example$y <- with(example,
> 2 * x1 + 4 * x2 + x1 * x2 + rnorm(100) * 2)
> ## In the following code, the statistical information about the
> ## interaction term is the same across the two scalings
> summary(lm(y ~ x1 * x2, data = example))
> summary(lm(y ~ x3 * x4, data = example))
> ## In the following code, the statistical information about the
> ## interaction term is the not same across the two scalings
> summary(lm(y ~ x1 + x1:x2, data = example))
> summary(lm(y ~ x3 + x3:x4, data = example))
> NB: this obscure fact was published in Robinson, A.P., Pocewicz, A.L., Gessler, P.E., 2004. A cautionary note on scaling variables that
> appear only in products in ordinary least squares. Forest Biometry, Modelling and Information Sciences 1, 83–90. I first submitted it to Remote Sensing of the Environment (in which this failing to respect strong hierarchy is most pernicious) and R1 said it was completely obvious that failing to respect strong hierarchy was a stupid idea, reject; whereas R2 said they had never heard of this therefore it could not possibly be true, reject.
> I'm not sure if it's similar to language independence .... ? Interesting conjecture! Can you unpack that a little?
> Cheers,
> Andrew
>
> --
> Andrew Robinson
> Director, CEBRA and Professor of Biosecurity,
> School/s of BioSciences and Mathematics & Statistics
> University of Melbourne, VIC 3010 Australia
> Tel: (+61) 0403 138 955
> Email: apro using unimelb.edu.au
> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>
> I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
> On Mar 9, 2026 at 21:04 +1100, Peter Dalgaard <pdalgd using gmail.com>, wrote:
>> Example?
>>
>> Is this similar to language independence getting lost under similar circumstances because e.g. Ja/Nej in Danish sorts opposite to Yes/No?
>>
>> -pd
>>
>>> On 9 Mar 2026, at 10.34, Andrew Robinson <apro using unimelb.edu.au> wrote:
>>>
>>> Curiously enough, scale independence is lost in models that lack Nelder’s strong heredity (eg main effects are missing for interactions).
>>> Cheers,
>>> Andrew
>>>
>>> --
>>> Andrew Robinson
>>> Director, CEBRA and Professor of Biosecurity,
>>> School/s of BioSciences and Mathematics & Statistics
>>> University of Melbourne, VIC 3010 Australia
>>> Tel: (+61) 0403 138 955
>>> Email: apro using unimelb.edu.au
>>> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>>>
>>> I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
>>> On 9 Mar 2026 at 8:13 PM +1100, Peter Dalgaard <pdalgd using gmail.com>, wrote:
>>>> > Sometimes it is just a matter of units: If you change the predictor from millimeter to meter, then the regression coefficient automatically scales down by a factor 1000. The fit should be the same mathematically, although sometimes very extreme scale differences confuse the numerical algorithms. E.g. the design matrix can be declared singular even though it isn't.
>>>> >
>>>> > (Scale differences have to be pretty extreme to affect OLS, though. More common is that nonlinear methods are impacted via convergence criteria or numerical derivatives.)
>>>> >
>>>> > -pd
>>>> >
>>>>> >> On 8 Mar 2026, at 19.15, Brian Smith <briansmith199312 using gmail.com> wrote:
>>>>> >>
>>>>> >> Hi Michael,
>>>>> >>
>>>>> >> You made an interesting point that, scale of the underlying variable
>>>>> >> may be vastly different as compared with other variables in the
>>>>> >> equation.
>>>>> >>
>>>>> >> Could I use logarithm of that variable instead of raw? Another
>>>>> >> possibility is that we could standardise that variable. But IMO, for
>>>>> >> out of sample prediction, the interpretation of standardisation is not
>>>>> >> straightforward.
>>>>> >>
>>>>> >> On Sun, 8 Mar 2026 at 23:05, Michael Dewey <lists using dewey.myzen.co.uk> wrote:
>>>>>>> >>> >
>>>>>>> >>> > Dear Brian
>>>>>>> >>> >
>>>>>>> >>> > You have not given us much to go on here but the problem is often
>>>>>>> >>> > related to the scale of the variables. So if the coefficient is per year
>>>>>>> >>> > tryin to re-express time in months or weeks or days.
>>>>>>> >>> >
>>>>>>> >>> > Michael
>>>>>>> >>> >
>>>>>>> >>> > On 08/03/2026 11:50, Brian Smith wrote:
>>>>>>>>> >>>> >> Hi,
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> My question is not directly related to R, but rather a basic question
>>>>>>>>> >>>> >> about statistics. I am hoping to receive valuable insights from the
>>>>>>>>> >>>> >> expert statisticians in this group.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> In some cases, when fitting a simple OLS regression, I obtain an
>>>>>>>>> >>>> >> estimated beta coefficient that is very small—for example, 0.00034—yet
>>>>>>>>> >>>> >> it still appears statistically significant based on the p-value.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> I am trying to understand how to interpret such a result in practical
>>>>>>>>> >>>> >> terms. From a magnitude perspective, such a small coefficient would
>>>>>>>>> >>>> >> not be expected to meaningfully affect the predicted response value,
>>>>>>>>> >>>> >> but statistically it is still considered significant.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> I would greatly appreciate any insights or explanations regarding this
>>>>>>>>> >>>> >> phenomenon.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> Thanks for your time.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> ______________________________________________
>>>>>>>>> >>>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> >>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> >>>> >> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>>>>>>>>> >>>> >> and provide commented, minimal, self-contained, reproducible code.
>>>>>>> >>> >
>>>>>>> >>> > --
>>>>>>> >>> > Michael Dewey
>>>>>>> >>> >
>>>>> >>
>>>>> >> ______________________________________________
>>>>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> >> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>>>>> >> and provide commented, minimal, self-contained, reproducible code.
>>>> >
>>>> > --
>>>> > Peter Dalgaard, Professor,
>>>> > Center for Statistics, Copenhagen Business School
>>>> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>>> > Phone: (+45)38153501
>>>> > Office: A 4.23
>>>> > Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
>>>> >
>>>> > ______________________________________________
>>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>> > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>> >
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
>>
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-help
mailing list