[BioC] undefined columns selected error when using	bagging{ipred}
    Valerie Obenchain 
    vobencha at fhcrc.org
       
    Sun Sep  9 16:45:46 CEST 2012
    
    
  
Hi Constanze,
The problems appears to be with how bagging() deals with the column 
names of the sample data frame. The immediate solution is to change the 
column names to non-numbers,
 > bagg <- bagging(response ~., data = exprDF[,selected], ntrees = 100)
Error in `[.data.frame`(m, attr(Terms, "term.labels")) :
undefined columns selected
 > dat <- exprDF[,selected]
 > colnames(dat) <- paste0("A", 1:ncol(dat))
 > bagg <- bagging(response ~., data = dat, ntrees = 100)
 > bagg
Bagging survival trees with 25 bootstrap replications
Call: bagging.data.frame(formula = response ~ ., data = df, ntrees = 100)
As you've seen from error messages as you've worked through these 
examples, several packages are no longer maintained and many functions 
have evolved since the book was written. ipred is currently maintained 
and it is the package that bagging() comes from. I'm cc'ing the 
maintainer because this issue may be a bug.
Hi Torsten,
It looks like bagging() does not like colnames that are numeric coerced 
to character. Using an modified example from ?bagging,
data(DLBCL)
## first example works fine
mod <- bagging(Surv(time,cens) ~ ., data=DLBCL, coob=TRUE)
## change the column names of the data.frame
names(DLBCL) <- c("DLCL.Sample", "Gene.Expression", "time", "cens", 
"IPI", 1:10)
 > names(DLBCL)
[1] "DLCL.Sample" "Gene.Expression" "time" "cens"
[5] "IPI" "1" "2" "3"
[9] "4" "5" "6" "7"
[13] "8" "9" "10"
 > mod <- bagging(Surv(time,cens) ~ ., data=DLBCL, coob=TRUE)
Error in `[.data.frame`(m, attr(Terms, "term.labels")) :
undefined columns selected
The error is thrown from this line in the irpart() function,
isord <- unlist(lapply(m[attr(Terms, "term.labels")], tfun))
When the 'Terms' variable is created, the term labels are created with 
an extra backslash "`" which prevents them from being matched to the 
column names of the data.frame (m),
debugging in: irpart(y ~ ., data = mydata, control = control, bcontrol = 
list(nbagg = nbagg,
ns = ns, replace = REPLACE))
...
Browse[2]>
debug: Terms <- attr(m, "terms")
...
Browse[2]> attr(Terms, "term.labels")
[1] "DLCL.Sample" "Gene.Expression" "IPI" "`1`"
[5] "`2`" "`3`" "`4`" "`5`"
[9] "`6`" "`7`" "`8`" "`9`"
[13] "`10`"
...
Browse[2]> colnames(m)
[1] "y" "DLCL.Sample" "Gene.Expression" "IPI"
[5] "1" "2" "3" "4"
[9] "5" "6" "7" "8"
[13] "9" "10"
Valerie
On 09/05/12 08:21, Constanze [guest] wrote:
> Dear All,
>
> i'm trying to reproduce the results of the survival analysis in Capter 17, p.307 of "Bioinformatics and Computational Biology Solutions using R and Bioconductor" using the code chunks from http://www.bioconductor.org/help/publications/books/bioinformatics-and-computational-biology-solutions/chapter-code/Computational_Inference.R
> The call to the bagging function throws an error, although i decreased the amount of variables selected to p=25 (so the model fit wouldn't be over-determined). The code is below.
>
> Thanks a lot,
>
> Constanze
>
>
>> library("exactRankTests")
>   Package ‘exactRankTests’ is no longer under development.
>   Please consider using package ‘coin’ instead.
>
>> # library("coin")
>> library("ipred")
> Lade nötiges Paket: rpart
> Lade nötiges Paket: MASS
> Lade nötiges Paket: mlbench
> Lade nötiges Paket: nnet
> Lade nötiges Paket: class
>> library("kidpack")
> *** Deprecation warning ***:
> The package 'kidpack' is deprecated and will not be supported after Bioconductor release 2.1.
>
>
>> data(eset)
>> var_selection<- function(indx, expressions, response, p = 100) {
> +
> +     y<- switch(class(response),
> +         "factor" = { model.matrix(~ response - 1)[indx, ,drop = FALSE] },
> +         "Surv" = { matrix(cscores(response[indx]), ncol = 1) },
> +         "numeric" = { matrix(rank(response[indx]), ncol = 1) }
> +     )
> +
> +     x<- expressions[,indx, drop = FALSE]
> +     n<- nrow(y)
> +     linstat<- x %*% y
> +     Ey<- matrix(colMeans(y), nrow = 1)
> +     Vy<- matrix(rowMeans((t(y) - as.vector(Ey))^2), nrow = 1)
> +
> +     rSx<- matrix(rowSums(x), ncol = 1)
> +     rSx2<- matrix(rowSums(x^2), ncol = 1)
> +     E<- rSx %*% Ey
> +     V<- n / (n - 1) * kronecker(Vy, rSx2)
> +     V<- V - 1 / (n - 1) * kronecker(Vy, rSx^2)
> +
> +     stats<- abs(linstat - E) / sqrt(V)
> +     stats<- do.call("pmax", as.data.frame(stats))
> +     return(which(stats>  sort(stats)[length(stats) - p]))
> + }
>>
>> remove<- is.na(eset$survival.time)
>> seset<- eset[,!remove]
>> response<- Surv(seset$survival.time, seset$died)
>> response[response[,1] == 0]<- 1
>> expressions<- t(apply(exprs(seset), 1, rank))
>> exprDF<- as.data.frame(t(expressions))
>>
>> I<- nrow(exprDF)
>> Iindx<- 1:I
>> selected<- var_selection(Iindx, expressions, response,p=25)
>> bagg<- bagging(response ~., data = exprDF[,selected],ntrees = 100)
> Fehler in `[.data.frame`(m, attr(Terms, "term.labels")) :
>    undefined columns selected
>
>
>   -- output of sessionInfo():
>
> R version 2.15.1 (2012-06-22)
> Platform: i486-pc-linux-gnu (32-bit)
>
> locale:
>   [1] LC_CTYPE=de_DE.utf8       LC_NUMERIC=C
>   [3] LC_TIME=de_DE.utf8        LC_COLLATE=de_DE.utf8
>   [5] LC_MONETARY=de_DE.utf8    LC_MESSAGES=de_DE.utf8
>   [7] LC_PAPER=C                LC_NAME=C
>   [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] kidpack_1.5.10        ipred_0.8-8           class_7.3-4
>   [4] nnet_7.3-4            mlbench_2.1-1         MASS_7.3-21
>   [7] rpart_3.1-54          exactRankTests_0.8-22 affy_1.26.0
> [10] Biobase_2.8.0         survival_2.36-14
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0         preprocessCore_1.10.0 tools_2.15.1
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
    
    
More information about the Bioconductor
mailing list