[BioC] Need help: no MTC possible

James W. MacDonald jmacdon at uw.edu
Mon Oct 15 15:33:09 CEST 2012


Hi Suparna,

On 10/15/2012 7:01 AM, suparna mitra wrote:
> Hi all,
>    I have been working in a project where I have Affymetrix Hgene 1.0 St V1
> data. And I have tree groups of patients having 6 samples each. I tried to
> perform rma normalization and to filter my data based on expression values
> 20%. After that went for unpaired t-test to test each two combination of
> groups. But the problem is my data is extremely variable.
> I have tried to filter my genes based on variance and/or CV before testing,
> to try to reduce the number of genes entering your test and multiple
> correction.  But with different reasonable filtering also I am with no
> luck. And I don't have the option to increase sample size of my project.
> Further I tried to check for the bad samples and bad probes from
> experimentand remove outlier if these are not of interest. Still the same
> when run t-test (and other possible test like Mann-Whitney) with MTC there
> are no genes.
> On the other hand if I go on with out MTC and select a good p value cutoff
> and reasonable fold change I get a list of significant gene which may be
> good or reasonable for my study. but the problem is I somehow need to
> justify the method for my finding. Do you know any study or paper where
> anybody has treated their data without MTC?
> My main concern is if I find a good story matching biological prospective,
> would it be anyhow possible to justify the method without MTC?

It's not clear to me what you are doing here - when you filter on 
variance are you keeping or removing the high variability genes 
(keeping, I hope)? I am also not sure what MTC stands for - is this 
multiple test correction?

Anyway, assuming I have things correct, some suggestions. First, you 
might want to use array weights when fitting your model. If you have a 
lot of intra-group variability, this will tend to help.

Second, the t-statistic is the universally most powerful test (assuming 
the underlying data are relatively hump-shaped), so going to a 
non-parametric test will usually reduce rather than increase power to 
detect differences.

Third, univariate tests are arguably not the most sophisticated way of 
analyzing expression data, and you might get better (or at least more 
satisfactory) results if you instead looked at analyzing for groups of 
genes rather than individually.

Depending on your experiment, you could accomplish this task with a gene 
set analysis (there are multiple ways of doing this - perhaps the 
easiest being romer() and roast() in limma), or if you have phenotypic 
data, especially continuous measures, a WGCNA analysis might be of some use.

Best,

Jim


> Thanks a lot,
> Suparna.
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list