[Bioc-sig-seq] Supplying own variance functions and adjusted counts to a DESeq dataset
Simon Anders
anders at embl.de
Sat Jul 16 13:00:12 CEST 2011
[repost, as my original post of yesterday somehow got dropped by the
mailing list manager]
Hi Sean
On 2011-07-14 21:54, Sean Ruddy wrote:
> I have a RNA-Seq count data set that requires separate offset values for
> each tag and sample. DESeq does not appear to take a matrix of offset values
> (unlike edgeR) in any of its functions so I've carried out the analysis
> manually, ie. calculating a size factor for each tag of each sample,
> adjusting the counts, then proceeding to calculate means and variances of
> the adjusted counts, and finally fitting a curve for each condition to the
> mean-var plot using locfit().
>
> Essentially, I'd like to put these variance functions (or at least all the
> predicted variances) and adjusted counts inside a DESeq object so that I can
> take advantage of the other functions DESeq offers, tests, plots, etc...
We refactored thing a bit in the devel version, and it is now easier to
inject your own variance estimates.
If you now run 'estimateDispersions', it adds columns 'disp_<cond>'
(where <cond> is the name a condition, or "pooled" or "blind", depending
on the "method" argument) to the feature data slot. If you want to use
your own dispersion estimation scheme, you can just put values there,
and the testing functions will use them.
However, I understand that you are actually happy with the estimation,
you just want to pass gene-specific size factors, presumably to correct
for GC biases. Our planned next step in our refactoring effort was to
offer a slot, where you would pass a matrix of values, of the same
dimensions as the count table, wich will be multiplied by the size
factors each time they are used. From your post, I learned that the
edgeR authors were again faster then we ;-) and have already added such
a feature. As demand for this will increase (e.g. to interface to the
new 'cqn' package that Hansen, Irizarry and Wu announced in their recent
preprint), we should better add it, too, I guess.
Until then, have a look at the source code of DESeq: You will notice
that we separated well the interface functions that deal with the
CountDataSet objects, and the calculation functions that just work on
matrices. So, if you want to use a functionality that should be there
but is hard to use due to the format of the CountDataSet object, you can
typically call the core function directly. For example, the function
'estimateAndFitDispersionsFromBaseMeansAndVariances' takes a list of
mean and dispersion and returns a mean-dispersion fit.
Simon
More information about the Bioc-sig-sequencing
mailing list