[Bioc-sig-seq] "pooled" dispersion estimation in edgeR

Sat Jul 16 04:26:52 CEST 2011

Hi Sean,

On Fri, 15 Jul 2011, Sean Ruddy wrote:

> Hi Gordon,
>
> Thanks for the response. One of my data sets has 8 conditions and no 
> replicates and so I wanted to emulate DESeq's way of pooling the samples 
> and also use an offset matrix. I was hoping to avoid doing it manually 
> so that I don't mess it up. I could do this all in edgeR and pool the 
> samples but I'm not sure how well this would work under edgeR vs. DESeq.

edgeR has a very flexible interface, so there was no need to explicitly 
introduce a "pooled" method.  Instead, this sort of thing can be handled 
by the usual functions in the usual way.  Suppose you have a data object 
y, which includes an offset matrix:

    y$offset <- your matrix

Then you can estimate the "pooled" dispersion simply by:

    y <- estimateGLMCommonDisp(y)

The fact that you don't supply a design matrix means that the samples are 
automatically treated as one group, i.e., pooled.  You can estimate a 
trended or tagwise dispersions in the same way.  Then

    fit <- glmFit(y,design)  etc

will do any analysis you want using dispersions estimated when the samples 
were pooled.

I and the other edgeR authors are anxious to get feedback, so write again 
if this doesn't turn out to be clear.

> I am curious though what sounds off to you in my previous email. I don't 
> feel entirely comfortable doing this manually but hopefully it's just 
> because I left out some details. I was trying to follow the DESeq method 
> and the only difference I saw was in the size factor calculations which 
> I changed for my own needs by using the offset values for each tag and 
> sample.

Even if you could estimate the variances yourself, I don't see any manual 
way that you could perform valid statistical tests, while correctly 
accounting for the offsets.  The whole negative binomial methodology 
requires genuine counts rather than adjusted counts.  So handling the 
offsets needs to be built-in.

Best wishes
Gordon

> I appreciate the help!
>
> Best,
> Sean
>
> On Fri, Jul 15, 2011 at 12:02 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Hi Sean,
>>
>> I'm curious to know why not use edgeR, since edgeR does what you want and
>> DESeq doesn't?
>>
>> I might be wrong, but the manual analysis that you describe doesn't sound
>> right.
>>
>> Best wishes
>> Gordon
>>
>>  Date: Thu, 14 Jul 2011 12:54:49 -0700
>>> From: Sean Ruddy <sruddy17 at gmail.com>
>>> To: bioc-sig-sequencing at r-project.**org<bioc-sig-sequencing at r-project.org>
>>> Subject: [Bioc-sig-seq] Supplying own variance functions and adjusted
>>>        counts  to a DESeq dataset
>>>
>>> Hi,
>>>
>>> I have a RNA-Seq count data set that requires separate offset values 
>>> for each tag and sample. DESeq does not appear to take a matrix of 
>>> offset values (unlike edgeR) in any of its functions so I've carried 
>>> out the analysis manually, ie. calculating a size factor for each tag 
>>> of each sample, adjusting the counts, then proceeding to calculate 
>>> means and variances of the adjusted counts, and finally fitting a 
>>> curve for each condition to the mean-var plot using locfit().
>>>
>>> Essentially, I'd like to put these variance functions (or at least all 
>>> the predicted variances) and adjusted counts inside a DESeq object so 
>>> that I can take advantage of the other functions DESeq offers, tests, 
>>> plots, etc...
>>>
>>> Thanks for the help!
>>>
>>> Sean

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}