[BioC] DESeq2 extracting results in version >= 1.3 (was: "using DESeq2 with multi factor data")
Michael Love
michaelisaiahlove at gmail.com
Mon Mar 24 16:23:44 CET 2014
hi Olga and Karen,
I updated the results() function in DESeq2 version 1.3 to simplify
extracting the kind of contrast we were discussing, combining main
effects and interactions. While I emailed the following code to build
a numeric contrast,
> ctrst <- numeric(21)
> rn <- resultsNames(dds)
> ctrst[ rn == "metasmet_no" ] <- -1
> ctrst[ rn == "metasmet_yes" ] <- 1
> ctrst[ rn == "timetime_2.metasmet_no" ] <- -1
> ctrst[ rn == "timetime_2.metasmet_yes" ] <- 1
> results(dds, contrast=ctrst)
this can now be accomplished by giving a list to the 'contrast'
argument. The first element of the list should be the effects for the
numerator of the log fold change, and the second element should be the
effects for the denominator:
results( dds, contrast = list(
c("metasmet_yes", "timetime_2.metasmet_yes"),
c("metasmet_no", "timetime_2.metasmet_no") ) )
There are more details and examples in the man page for ?results for
the development branch.
Mike
On Mon, Mar 17, 2014 at 9:33 AM, Michael Love
<michaelisaiahlove at gmail.com> wrote:
>
> hi Olga and Karen,
>
> Thanks for your interest in DESeq2 and for using the development
> branch, which allows us to clarify changes before they enter the
> release branch.
>
> This concerns a modeling change from v1.2 to v1.3. Changes like these
> are documented in the NEWS file with pointers to the
> functions/arguments that changed:
> (e.g. http://www.bioconductor.org/packages/2.14/bioc/news/DESeq2/NEWS)
> I've been working on including more documentation in the package, but
> haven't gotten to adding the parts about interaction terms yet, so
> this gives me a chance to explain a bit.
>
> The new resultsNames(dds) are due to "expanded model matrices", which
> include a column in the design matrix for each level of a factor, as
> opposed to "standard model matrices" produced by model.matrix, which
> have a column in the deisgn matrix for the levels other than the base
> level. Expanded model matrices are described a bit in the man pages
> and in the vignette, but not yet using an example with interaction
> effects. This makes the DE analysis independent of the choice to base
> level, whereas previously the log fold change (LFC) shrinkage was not
> independent of base level choices. In addition, it simplifies certain
> comparisons.
>
> You now have the columns "metasmet_no" and "metasmet_yes", whereas
> before you had "metas_met_yes_vs_met_no". Previously you would use
> the 'name' argument to extract this comparison, whereas with version
> >= 1.3 we want to encourage users to use the 'contrast' argument:
>
> results(dds, contrast=c("metas","met_yes","met_no"))
>
> If the analysis includes interaction terms, previously the above
> contrast would give the metastasis effect only for the base level of
> the other factors in the design, but for version >= 1.3, this gives
> the overall metastasis effect over all levels of the time variable.
>
> If you want the individual time-period-specific effects, this is the
> overall effect plus the interaction effects for that time period.You
> have the following columns of the model matrix:
>
> > [1] "Intercept" "timetime_1" "timetime_2" "timetime_3" "timetime_4" "timetime_5"
> > [7] "timetime_6" "metasmet_no" "metasmet_yes" "timetime_1.metasmet_no" "timetime_2.metasmet_no" "timetime_3.metasmet_no"
> > [13] "timetime_4.metasmet_no" "timetime_5.metasmet_no" "timetime_6.metasmet_no" "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
> > [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"
>
> So the metastasis effect specific to time period 2 would be, the
> contrast of "metasmet_yes" vs "metasmet_no" and additionally, the
> contrast of "timetime_2.metasmet_yes" vs "timetime_2.metasmet_no". We
> can pull these out using the numeric contrast vector, starting with
> all 0's and then filing in the -1's and 1's:
>
> ctrst <- numeric(21)
> rn <- resultsNames(dds)
> ctrst[ rn == "metasmet_no" ] <- -1
> ctrst[ rn == "metasmet_yes" ] <- 1
> ctrst[ rn == "timetime_2.metasmet_no" ] <- -1
> ctrst[ rn == "timetime_2.metasmet_yes" ] <- 1
> results(dds, contrast=ctrst)
>
>
> Mike
>
>
>
> On Mon, Mar 17, 2014 at 5:07 AM, solgakar at bi.technion.ac.il
> <solgakar at bi.technion.ac.il> wrote:
> >
> >
> >
> > From: Karen Chait [mailto:kchait at tx.technion.ac.il]
> > Sent: Monday, March 17, 2014 10:57 AM
> > To: Olga Karinsky
> > Subject: RE: using DESeq2 with multi factor data
> >
> > Hello all,
> > I am trying to use the DESeq2 package to perform RNA-Seq analysis on a data containing several factors.
> > I have been closely following the emails between Ming Yi and Michael Love, because I think that my problem is very similar to what they have discussed. But even though I received a lot of useful information from their discussion, I still have several questions regarding my specific data.
> > Just as an overall information regarding my data, I have 96 samples and the two factors I am interested in exploring are "time" and "metastasis".
> > In order to build my data set I used the following commands:
> >
> > > countData = read.table("merged_counts.txt", header=TRUE, row.names=1)
> >
> > > metasVector=c("met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes"(
> >
> > > timePointsVector=c("6","4","6","6","3","6","3","5","6","6","1","5","3","4","3","6","6","6","2","6","1","2","4","6","5","5","5","3","6","5","6","2","6","6","1","5","5","6","6","6","6","6","6","4","2","6","3","1","2","5","6","1","1","3","6","3","6","4","4","5","6","6","3","5","4","6","1","4","3","1","1","1","4","2","1","1","3","6","1","1","2","1","6","3","3","2","5","3","2","3","1","4","1","1","6","1")
> >
> > > colData=data.frame(row.names=colnames(countData),metas=metasVector,gender=gendarVector)
> >
> > > colData$metas=factor(colData$metas, levels=c("met_no","met_yes"))
> >
> > > colData$time = factor(colData$time, levels = c("1", "2", "3", "4", "5", "6"))
> >
> > > dds=DESeqDataSetFromMatrix(countData=tmpcountData, colData=colData, design=~time + metas + metas:time)
> >
> > > dds=DESeq(dds)
> > I have several questions:
> >
> > - first of all I have tried running those commands on DESeq2 version 1.2.10 (R version 3.0.2) and DESeq2 version 1.3.47 (R version 3.0.2) and what I have received from the resultsNames() function I both cases is very different. Using the 1.2.10 version I have received:
> >
> > > resultsNames(dds)
> >
> > [1] "Intercept" "time_2_vs_1" "time_3_vs_1" "time_4_vs_1" "time_5_vs_1" "time_6_vs_1" "metas_met_yes_vs_met_no" "time2.metasmet_yes"
> >
> > [9] "time3.metasmet_yes" "time4.metasmet_yes" "time5.metasmet_yes" "time6.metasmet_yes"
> >
> > > sessionInfo()
> >
> > R version 3.0.2 (2013-09-25)
> >
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> >
> >
> > locale:
> >
> > [1] LC_COLLATE=Hebrew_Israel.1255 LC_CTYPE=Hebrew_Israel.1255 LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C LC_TIME=Hebrew_Israel.1255
> >
> >
> >
> > attached base packages:
> >
> > [1] parallel stats graphics grDevices utils datasets methods base
> >
> >
> >
> > other attached packages:
> >
> > [1] DESeq2_1.2.10 RcppArmadillo_0.4.100.2.1 Rcpp_0.11.0 GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7 BiocGenerics_0.8.0
> >
> >
> >
> > loaded via a namespace (and not attached):
> >
> > [1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7 genefilter_1.44.0 grid_3.0.2 lattice_0.20-27 locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4
> >
> > [11] splines_3.0.2 stats4_3.0.2 survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-3
> >
> >
> >
> > Using the 1.3.47 version I have received:
> >
> > > resultsNames(dds)
> >
> > [1] "Intercept" "timetime_1" "timetime_2" "timetime_3" "timetime_4" "timetime_5"
> >
> > [7] "timetime_6" "metasmet_no" "metasmet_yes" "timetime_1.metasmet_no" "timetime_2.metasmet_no" "timetime_3.metasmet_no"
> >
> > [13] "timetime_4.metasmet_no" "timetime_5.metasmet_no" "timetime_6.metasmet_no" "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
> >
> > [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"
> >
> > > sessionInfo()
> >
> > R version 3.0.2 (2013-09-25)
> >
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> >
> >
> > locale:
> >
> > [1] LC_COLLATE=Hebrew_Israel.1255 LC_CTYPE=Hebrew_Israel.1255 LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C LC_TIME=Hebrew_Israel.1255
> >
> >
> >
> > attached base packages:
> >
> > [1] parallel stats graphics grDevices utils datasets methods base
> >
> >
> >
> > other attached packages:
> >
> > [1] DESeq2_1.3.47 RcppArmadillo_0.4.100.0 Rcpp_0.11.0 GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7
> >
> > [7] BiocGenerics_0.8.0
> >
> >
> >
> > loaded via a namespace (and not attached):
> >
> > [1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2
> >
> > [8] lattice_0.20-27 locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-7
> >
> > [15] XML_3.98-1.1 xtable_1.7-3
> >
> > (I have ran the 1.3.47 version the same way besides a difference in the names of the time levels, but I do not believe that this is the reason for the differences)
> > I don't fully understand the results I receive using the 1.3.47 version and even more the difference between the versions.
> >
> > - From my understanding, the results I received using the 1.2.10 version are the more reasonable and they fit my settings of base levels in the data. Now after receiving these results I would love to understand how do I receive different contrast testing? For each time period metas_yes vs. metas_no (for example timetime_2.metasmet_yes vs. timetime_2.metasmet_no)
> >
> >
> >
> > Thank you in advance,
> >
> > Olga and Karen
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list