[BioC] Up & Downregulated genes using DESeq

Tue Mar 27 16:06:54 CEST 2012

Hi

On 03/27/2012 03:43 PM, Sunny Yu Liu wrote:
> Because no matter how fancy the statistics is, we have to find something
> having biological significance. For example, in most time, even have
> p<0.000000000001, still get no biological effects.
> For most gene expression change, people always use fold change 2 as a
> cutoff for microarray or qPCR. As for RNAseq, since the method is much
> more sensitive, I guess it must lose some specificity, so I think it may
> need a higher cutoff number than 2.

I guess, this needs some clarification, before we confuse newcomers to 
RNA-Seq too much,

A cut-off of 2 on a log2 scale means a fold change of at least 
four-fold. This is a lot, and there is plenty of cases where a weaker 
signal is biological meaningful. Depending on the experimental setup, 
fold changes of +/-20% (.26 on a log2 scale) or even much less can be 
informative.

In most RNA-Seq experiments, a p value cut-off to control false discover 
rate at some sensible value, say 5% or 10%, will not allow any genes 
with a log2 fold change below a certain value to be called significant, 
and in my experience, this value nearly always make further fold-change 
cut-offs unnecessary.

This belief that fold-change cut-offs are important may stem from the 
proliferation of incorrect analysis methods in the RNA-Seq literature. 
In very many papers, analysis methods based on Fisher's test or on a 
likelihood ratio test based on a Poisson distribution are used. In fact, 
there are even several reviews which suggest such approaches.

Apart from the fact that these tests simply inadmissible (see e.g. 
Baggerly et al., Bioinformatics 19 (2003) 1477), they cause a peculiar 
pattern: They give all genes with expression strength above a few tens 
of thousand reads absurdly low p values even in case of very weak fold 
changes, so that nearly all strongly expressed genes are significant. 
I've seen in several posts on this and other mailing lists the advice to 
use a very small p value threshold (adjusted p value < .001 or the 
like), combined with a fold change cut-off, to rectify the situation.

With a correct test, it is rare that you have significant genes with so 
weak fold change that you have to doubt their biological relevance.

I see, however, at least one case where a fold-change cut-off is useful: 
It is unavoidable that the decision boundary that separated significant 
from non-significant fold-changes goes down with increasing count 
values. Hence, any list of "hits" will be enriched for strong genes. If 
this is problematic for downstream analysis, one may opt, as a crude but 
workable remedy, to decide on a fold change cut-off and consider all 
genes below this as not significant _and_ omit from the universe of all 
enrichment tests all genes with a count value below the count required 
for this fold change to becone significant (i.e., considering these weak 
genes as essentially "not testable" rather than "not significant").

   Simon