[R] Problem with subset() function?
    Steven McKinney 
    smckinney at bccrc.ca
       
    Wed Jan 21 00:02:14 CET 2009
    
    
  
Hi all,
Can anyone explain why the following use of
the subset() function produces a different
outcome than the use of the "[" extractor?
The subset() function as used in
 density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
appears to me from documentation to be equivalent to
 density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
(modulo exclusion of NAs) but use of the former yields an 
error from density.default() (shown below).
Is this a bug in the subset() machinery?  Or is it
a documentation issue for the subset() function
documentation or density() documentation?
I'm seeing issues such as this with newcomers to R
who initially seem to prefer using subset() instead
of the bracket extractor.  At this point these functions
are clearly not exchangeable.  Should code be patched
so that they are, or documentation amended to show
when use of subset() is not appropriate?
> ### Bug in subset()?
> set.seed(123)
> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
+                    wt = 150 + 10 * rnorm(100),
+                    age = sample(20:60, size = 100, replace = TRUE)
+                    )
> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = c(age))) : 
  argument 'x' must be numeric
> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
Call:
	density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])
Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.);	Bandwidth 'bw' = 5.816
       x                y            
 Min.   : 4.553   Min.   :3.781e-05  
 1st Qu.:22.776   1st Qu.:3.108e-03  
 Median :41.000   Median :1.775e-02  
 Mean   :41.000   Mean   :1.370e-02  
 3rd Qu.:59.224   3rd Qu.:2.128e-02  
 Max.   :77.447   Max.   :2.665e-02  
> sessionInfo()
R version 2.8.0 Patched (2008-11-06 r46845) 
powerpc-apple-darwin9.5.0 
locale:
C
attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     
loaded via a namespace (and not attached):
[1] Matrix_0.999375-16 grid_2.8.0         lattice_0.17-15    lme4_0.99875-9    
[5] nlme_3.1-89       
> 
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
email: smckinney +at+ bccrc +dot+ ca
tel: 604-675-8000 x7561
BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada
    
    
More information about the R-help
mailing list