[R] OLS standard errors

Tue Feb 26 08:57:54 CET 2008

Daniel Malter wrote:
> Hi,
>
> the standard errors of the coefficients in two regressions that I computed
> by hand and using lm() differ by about 1%. Can somebody help me to identify
> the source of this difference? The coefficient estimates are the same, but
> the standard errors differ. 
>
> ####Simulate data
> 	
> 	happiness=0
> 	income=0
> 	gender=(rep(c(0,1,1,0),25))
> 		for(i in 1:100){
>   			happiness[i]=1000+i+rnorm(1,0,40)
>  			income[i]=2*i+rnorm(1,0,40)
>   			}
>
> ####Run lm()
>
> 	reg=lm(happiness~income+factor(gender))
> 	summary(reg)
>
> ####Compute coefficient estimates "by hand"
>
> 	x=cbind(income,gender)
> 	y=happiness
>
> 	z=as.matrix(cbind(rep(1,100),x))
> 	beta=solve(t(z)%*%z)%*%t(z)%*%y
>
> ####Compare estimates
>
> 	cbind(reg$coef,beta)  ##fine so far, they both look the same
> 	
> 	reg$coef[1]-beta[1]
> 	reg$coef[2]-beta[2]
> 	reg$coef[3]-beta[3]	##differences are too small to cause a 1%
> difference
>
> ####Check predicted values
>
> 	estimates=c(beta[2],beta[3])
>
> 	predicted=estimates%*%t(x)
> 	predicted=as.vector(t(as.double(predicted+beta[1])))
> 	
> 	cbind(reg$fitted,predicted)		##inspect fitted values
> 	as.vector(reg$fitted-predicted)	##differences are marginal
>
> #### Compute errors
>
> 	e=NA
> 	e2=NA
> 	for(i in 1:length(happiness)){
>   		e[i]=y[i]-predicted[i]   ##for "hand-computed" regression
>   		e2[i]=y[i]-reg$fitted[i] ##for lm() regression
>   	}
>
> #### Compute standard error of the coefficients
>
>   sqrt(abs(var(e)*solve(t(z)%*%z)))	##for "hand-computed" regression
>   sqrt(abs(var(e2)*solve(t(z)%*%z)))	##for lm() regression using e2 from
> above
>
> 	##Both are the same
>
> ####Compare to lm() standard errors of the coefficients again
>
> 	summary(reg)
>
>
> The diagonal elements of the variance/covariance matrices should be the
> standard errors of the coefficients. Both are identical when computed by
> hand. However, they differ from the standard errors reported in
> summary(reg). The difference of 1% seems nonmarginal. Should I have
> multiplied var(e)*solve(t(z)%*%z) by n and divided by n-1? But even if I do
> this, I still observe a difference. Can anybody help me out what the source
> of this difference is?
>
>   
The degrees of freedom in a regression analysis is n minus the number of 
parameters, three in your case. I.e. the variance var(e) does not know 
about this and divides by n-1 where it should have been n-3, so.....

> Cheers,
> Daniel
>
>
> -------------------------
> cuncta stricte discussurus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907