[R] mean of subset of rows

Mon Oct 1 18:42:33 CEST 2007

--- darteta001 at ikasle.ehu.es wrote:

> Dear list, 
> this must be an easy one:
> 
> I have a data.frame of two columns, "ID" with four
> different levels (A 
> to D) and numerical "size", and each of the 4
> different IDs is 
> repeated a 
> different number of times. I would like to get the
> mean size for each 
> ID as another data.frame. I have tried the
> following:
> 
> >ID= as.character(unique(data[,1])) # I use unique()
> because "data" 
> will be larger in future
> >nIDs = length(ID)
> >for(i in 1:nIDs){
> +  subdata = subset(data,V1==ID[i])
> +  average =
> as.data.frame(cbind(1:i,ID[i],mean(subdata[,2]))
> + }
> 
dfnames  <- c("id","v1")

mydata  <- data.frame(id <-as.factor( c("a","a","b",
"c","c", "b")),
          v1 <- c(2,3,3,2,2,4) )
         names(mydata) <- dfnames
mydata

mysums <-aggregate(mydata[2], id, mean)
 names(mysums)  <- dfnames
 mysums

I am not exactly sure what is happening in that loop
but you have no place to store the results of each
iteration.

This loop should work but you are much better off to
use the aggregate command.  For loops are not liked in
R.   Good luck.

data <- mydata
ID= as.character(unique(data[,1]))
nIDs = length(ID)
average <- matrix(NA, nrow=nIDs, ncol=1)
for(i in 1:nIDs){
  subdata = subset(data,id==ID[i])
  average[i] = mean(subdata[,2])
 }

 average
 newdata <- data.frame(ID,average)
 names(newdata) <- dfnames
 newdata

> Unfortunately, my output only gets the last level of
> ID four times:
> >average
>      V1 V2               V3
> 1  1  D 179.777777777778
> 2  2  D 179.777777777778
> 3  3  D 179.777777777778
> 4  4  D 179.777777777778
> 
> How can I get what I need? there might be an easier
> way to do it, but 
> I guess my skills aren´t that good. Any suggestions
> are welcome
> 
> Regards,
> 
> David