[R] aggregate.formula implicitly removes rows containing NA
    Peter Ehlers 
    ehlers at ucalgary.ca
       
    Wed Jan 12 02:13:48 CET 2011
    
    
  
On 2011-01-11 14:41, Dickison, Daniel wrote:
> The documentation for `aggregate` makes it sound like aggregate.formula should behave identically to aggregate.data.frame (apart from the way the parameters are passed).  But it looks like aggregate.formula is quietly removing rows where any of the "output" variables (those on the LHS of the formula) are NA.  This differs from how aggregate.data.frame works.  Is this expected behavior?
>
> Here are a couple of examples:
>
>> d<- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3))
>> aggregate(d["b"], d["a"], mean)
>    a   b
> 1 1 1.5
> 2 2  NA
>> aggregate(b ~ a, d, mean)
>    a   b
> 1 1 1.5
> 2 2 3.0
>
> It's removing whole rows even if just one of the columns is NA, i.e.:
>
>> d<- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3),
> +                 c=c(NA,2,3,NA))
>> aggregate(cbind(b,c) ~ a, d, mean)
>    a b c
> 1 1 2 2
>
> Daniel
Try setting na.acton = na.pass.
Peter Ehlers
    
    
More information about the R-help
mailing list