[R] RandomForest question
    Liaw, Andy 
    andy_liaw at merck.com
       
    Thu Jul 21 16:16:43 CEST 2005
    
    
  
> From: Arne.Muller at sanofi-aventis.com
> 
> Hello,
> 
> I'm trying to find out the optimal number of splits (mtry 
> parameter) for a randomForest classification. The 
> classification is binary and there are 32 explanatory 
> variables (mostly factors with each up to 4 levels but also 
> some numeric variables) and 575 cases.
> 
> I've seen that although there are only 32 explanatory 
> variables the best classification performance is reached when 
> choosing mtry=80. How is it possible that more variables can 
> used than there are in columns the data frame?
It's not.  The code for randomForest.default() has:
    ## Make sure mtry is in reasonable range.
    mtry <- max(1, min(p, round(mtry)))
so it silently sets mtry to number of predictors if it's too large.
As an example:
> library(randomForest)
randomForest 4.5-12 
Type rfNews() to see new features/changes/bug fixes.
> iris.rf = randomForest(Species ~ ., iris, mtry=10)
> iris.rf$mtry
[1] 4
I should probably add a warning in such cases...
Andy
 
> 	thanks for your help
> 	+ kind regards,
> 
> 	Arne
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>
    
    
More information about the R-help
mailing list