[BioC] classification method applied to microarrays (CMA package)
Stephen Henderson
to.stephen.henderson at googlemail.com
Tue Oct 27 12:59:24 CET 2009
The svm is a reasonable classifier that performs OK on microarray data
and usually requires no tuning of parameters (usually)-- although many
others do too.
In order to understand the GeneSelection method you need to understand
cross validation (this occurs within the classification function). The
cross validation is estimating the classification error by splitting
the data into many training and test set combinations. The model --
your svm is built on the training set-- and then tested against the
test set to see how many errors of classification are made.
If you choose GeneSelection (which you probably should) then the data
is reduced to a subset of features/genes based on a simple stat.
However not only one set of genes will be selected-- but genes for
every training set in the cross validation. Otherwise the likely svm
misclassification error would be an overestimate.
So when you use the toplist function on your GeneSelection object you
will find that there are a number of feature lists none exactly the
same. The 'informative' genes are those that occur most frequently in
the toplists. You can examine the GeneSelection toplist before you run
the classification function-- but obviously you will want to run the
classification function to check that the features are indeed
'informative'.
You can use the GeneSelection method that gives the least cross-
validation error. I'd start with limma but if there is a reasonable
separation of classes then they should work similarly.
jeez I hope that is clear....
Stephen Henderson
UCL
On 27 Oct 2009, at 11:21, Juan Carlos Oliveros Collazos wrote:
> Dear all,
>
> I am starting using the CMA package for classification of microarray
> samples.
>
> In particular, I want to know which genes are the main responsible
> for separating about 60 lists of expression values into 2 groups
> that are already known. I understand that SVM is a good method to
> find the hyperplane that best separate the two groups but what I
> need are the genes, not the hyperplane parameters.
>
> My questions are:
>
> To get a list of genes, should I use in some manner SVMs (or another
> classification method) or what I need is simply to identify the
> "informative" genes by using GeneSelection function of CMA package?
>
> If so, the learning sets are needed? why?
>
> Any recomendation for choosing a gene selection method?
>
> Thanks in advance.
>
> best,
>
> Juan Carlos Oliveros
> CNB-CSIC, Madrid, Spain
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list