[R] weighted mean
    Peter Dalgaard 
    p.dalgaard at biostat.ku.dk
       
    Wed Nov 26 14:26:57 CET 2003
    
    
  
Jason Turner <jasont at indigoindustrial.co.nz> writes:
> MZodet at ahrq.gov wrote:
> 
> > How do I go about generating a WEIGHTED mean (and standard error) of a
> > variable (e.g., expenditures) for each level of a categorical variable
> > (e.g., geographic region)?  I'm looking for something comparable to PROC
> > MEANS in SAS with both a class and weight statement.
> 
> That's two questions.
> 1) to apply a weighted mean to a vector, see ?weighted.mean
 
> 2) to apply a function to data grouped by categorical variable, you
> probably need "by" or "tapply".  See the help pages and examples for
> both.
Three actually. Noone seems to have answered how to get the SD, and
that's a little more tricky.  
The simplest (well, the quickest) way to get the weighted SD is to do
a weighted regression analysis with just an intercept term:
x <- c(3,4,5); w <- c(2,5,7) # just for testing
summary(lm(x~1,weight=w))$sigma
# this is the weighted sum of squares on N-1 DF
wss <- sum((x-m)^2*w)
sqrt(wss/2)
Notice however that SAS also does frequency weighting where
(x=2.7,w=5) means that there are five observations of 2.7. 
In that case, the brute-force approach is 
sd(rep(x,w))
# which is the same as
sqrt(wss/13) # sum(w)-1 DF
-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
    
    
More information about the R-help
mailing list