[BioC] svm (e1071) class weighting in a multi-class problem
Steve Lianoglou
mailinglist.honeypot at gmail.com
Thu Mar 8 13:30:43 CET 2012
Hi Javier,
2012/3/8 Javier Pérez Florido <jpflorido at gmail.com>:
> Dear list,
> I have a question related to the class weighting parameter of svm classifier
> in e1071 package.
> Class weighting, as stated in the vignette of svm in such package, is useful
> when asymmetric class sizes are present. For example, for two classes A and
> B of 50 and 100 samples respectively, a weight of 2 can be assigned to class
> A and a weight of 1 to class B.
>
> However, what happens in a multi-class problem? In e1071 package, SVM
> follows the one-against-one approach and for K classes, K(K-1)/2 binary
> classifiers are built. In my case and depending on the comparison a
> different class weighting is desired. For example, if a problem has 3
> classes
> Class 1: 10 samples
> Class 2: 20 samples
> Class 3: 30 samples
>
> When a Class1 vs Class 2 classifier is built, I would like to use a weight
> of 2 for Class 1 and a weight of 1 for Class 2
> When a Class1 vs Class 3 classifier is built, I would like to use a weight
> of 3 for Class 1 and a weight of 1 for Class 3
> When a Class2 vs Class 3 classifier is built, I would like to use a weight
> of 1.5 for class 2 and a weight of 1 for Class 3
>
> How can svm handle this issue? How does svm really handle this issue (class
> weighting for a multi-class problem)?
By skimming through the e1071/src/svm.cpp code, it looks like the
class weights are a multiplier for the class-specific C term, ie. look
at the "soft margin" section here:
http://en.wikipedia.org/wiki/Support_vector_machine
Specifically the part of the optimization that includes the slack variables:
\min_{junk} = ... + C \sum_{i=1}^n \xi_i
Since C is just a multiplier on the sum of your slack vars, you can
expand it to have imbalanced C specific to your class size (or e1071
"weight"), something like:
\min_{junk} = ... + C_1 \sum_{i \in \mbox{Class}_1} \xi_i + C_2
\sum_{j \in \mbox{Class}_2} \xi_j
Ugh ... email LaTeX ... anyway, does that make sense?
I guess you can get pretty close to what you want by setting class
weights to 3,2,1, but not exactly since the class weights in the class
1 vs 2 comparison will be 3 vs 2, not 2 vs 1, but ...
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list