[Bioc-sig-seq] RNA-Seq - analysis of time series data?

Jakob Hedegaard Jakob.Hedegaard at agrsci.dk
Tue Nov 30 21:46:18 CET 2010


Hi Mark,
Thanks for your suggestions. I will try the different approached and see what I find.

As it is a time-series it would be convenient to be able to extract the list of "genes" being affected at one or more time points/contrastes.
Could the decideTestsDGE function be used for this purpose.
Or is there a function to conduct e.g. a F-test?

Best regards,
Jakob

-----Oprindelig meddelelse-----
Fra: Mark Robinson [mailto:mrobinson at wehi.EDU.AU]
Sendt: 25. november 2010 10:41
Til: Jakob Hedegaard; bioc-sig-sequencing at r-project.org
Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?

Hi Jakob.

Just another thought on this.  One alternative for your analysis may be to treat it as a 1-way setup [using d from d <- DGEList(counts=rd,group=group)] and use the non-GLM methods.  This means that you would be limited to pairwise comparisons, but that may suffice.

Cheers,
Mark

On 2010-11-23, at 9:43 PM, Mark Robinson wrote:

> Hi Jakob.
>
> The error message is due to the fact that your design matrix is not full rank (i.e. sum of columns 4-7 = sum of columns 2-3).  This is of course my fault, since I suggested it.
>
> That setup (previous post) would've worked for an experiment with observations for every combination of factors, which you do not have (controls are only measured at T=0, AP2/AP6 are not measured at T=0).  Non-full-rank design matrices can be difficult to interpret, so an alternative option is this:
>
> S <- factor(rep( c("Acon","AP6","AP2"),
>               c(6,5+6+6+5,5+6+5+4)))
> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
>               c(6,5,6,6,5,5,6,5,4)))
>
> f <- factor( paste(S,T,sep=".") )
> design <- model.matrix( ~ f )
> [...]
>
> As mentioned below, you may need to adjust the reference (here again, control at T=0) for testing a particular contrast of interest.
>
> Let us know how you go.
>
> Cheers,
> Mark
>
> On 2010-11-23, at 12:25 AM, Jakob Hedegaard wrote:
>
>> Hi Mark,
>>
>> Thanks.
>>
>> I have implemented your suggestion for the design matrix when reading the data (see below), but encounter the following error when running estimateCRDisp:
>>
>>> d <- estimateCRDisp(d,design)
>> Error in chol.default(crossprod(design, .vecmat(mu/(1 + dispersion * mu),  :
>> the leading minor of order 7 is not positive definite
>>
>> Applying the simple design:
>> design.time <- model.matrix(~ T, data=d$sample)
>> does not result in the error....
>>
>> Appreciate your help!
>>
>> Best regards,
>> Jakob
>>
>>
>>
>>
>> library(edgeR)
>> r <- read.delim(file.path(datadir, "RNAseq_AP-2-6-INF_liver_transcript_pivot.txt"))
>> dim(r)
>> #[1] 78728   205
>> rdata <- r[2:205]
>> #samp <- substr(colnames(r)[2:205],6,10)
>> #usamp <- unique(samp)
>> samp <- substr(colnames(r)[2:205],16,28)
>> usamp <- unique(samp)
>> rd <- matrix(NA,nrow=nrow(r),ncol=length(usamp))
>> for (i in 1:length(usamp)){
>> rd[,i] <- rowSums(rdata[,grepl(usamp[i],samp)])
>> }
>> colnames(rd) <- usamp
>> rownames(rd) <- r[,1]
>> # export data
>> #write.table(rd,file="RNAseq_AP-2-6-INF_liver_vs_AugSsc9_pivot.txt",sep="\t",quote=F,dec=".",row.names=T,col.names=NA)
>>
>> group <- substr(unique(substr(colnames(r)[2:205],16,28)),7,13)
>> #[1] "control" "control" "control" "control" "control" "control" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T12"
>> #[13] "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T48"
>> #[25] "AP6.T48" "AP6.T48" "AP6.T48" "AP6.T48" "AP2.T06" "AP2.T06" "AP2.T06" "AP2.T06" "AP2.T06" "AP2.T12" "AP2.T12" "AP2.T12"
>> #[37] "AP2.T12" "AP2.T12" "AP2.T12" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T48" "AP2.T48" "AP2.T48" "AP2.T48"
>> d <- DGEList(counts=rd,group=group)
>> d <- calcNormFactors(d)
>> d$samples$S <- factor(gsub("con","Acon",substr(d$samples$group,1,3)))
>> d$samples$T <- factor(gsub("rol","T00",(substr(d$samples$group,5,7))))
>> design <- model.matrix(~ S + T, data=d$sample)
>> # Analysis using Cox-Reid common dispersion
>> d <- estimateCRDisp(d,design)
>>
>>> design
>>             (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> AP_01_control           1    0    0    0    0    0    0
>> AP_02_control           1    0    0    0    0    0    0
>> AP_03_control           1    0    0    0    0    0    0
>> AP_28_control           1    0    0    0    0    0    0
>> AP_29_control           1    0    0    0    0    0    0
>> AP_30_control           1    0    0    0    0    0    0
>> AP_31_AP6.T06           1    0    1    1    0    0    0
>> AP_32_AP6.T06           1    0    1    1    0    0    0
>> AP_33_AP6.T06           1    0    1    1    0    0    0
>> AP_34_AP6.T06           1    0    1    1    0    0    0
>> AP_36_AP6.T06           1    0    1    1    0    0    0
>> AP_37_AP6.T12           1    0    1    0    1    0    0
>> AP_38_AP6.T12           1    0    1    0    1    0    0
>> AP_39_AP6.T12           1    0    1    0    1    0    0
>> AP_40_AP6.T12           1    0    1    0    1    0    0
>> AP_41_AP6.T12           1    0    1    0    1    0    0
>> AP_42_AP6.T12           1    0    1    0    1    0    0
>> AP_43_AP6.T24           1    0    1    0    0    1    0
>> AP_44_AP6.T24           1    0    1    0    0    1    0
>> AP_45_AP6.T24           1    0    1    0    0    1    0
>> AP_46_AP6.T24           1    0    1    0    0    1    0
>> AP_47_AP6.T24           1    0    1    0    0    1    0
>> AP_48_AP6.T24           1    0    1    0    0    1    0
>> AP_49_AP6.T48           1    0    1    0    0    0    1
>> AP_50_AP6.T48           1    0    1    0    0    0    1
>> AP_51_AP6.T48           1    0    1    0    0    0    1
>> AP_53_AP6.T48           1    0    1    0    0    0    1
>> AP_54_AP6.T48           1    0    1    0    0    0    1
>> AP_55_AP2.T06           1    1    0    1    0    0    0
>> AP_56_AP2.T06           1    1    0    1    0    0    0
>> AP_57_AP2.T06           1    1    0    1    0    0    0
>> AP_58_AP2.T06           1    1    0    1    0    0    0
>> AP_59_AP2.T06           1    1    0    1    0    0    0
>> AP_60_AP2.T12           1    1    0    0    1    0    0
>> AP_62_AP2.T12           1    1    0    0    1    0    0
>> AP_63_AP2.T12           1    1    0    0    1    0    0
>> AP_64_AP2.T12           1    1    0    0    1    0    0
>> AP_65_AP2.T12           1    1    0    0    1    0    0
>> AP_66_AP2.T12           1    1    0    0    1    0    0
>> AP_67_AP2.T24           1    1    0    0    0    1    0
>> AP_69_AP2.T24           1    1    0    0    0    1    0
>> AP_70_AP2.T24           1    1    0    0    0    1    0
>> AP_71_AP2.T24           1    1    0    0    0    1    0
>> AP_72_AP2.T24           1    1    0    0    0    1    0
>> AP_73_AP2.T48           1    1    0    0    0    0    1
>> AP_75_AP2.T48           1    1    0    0    0    0    1
>> AP_76_AP2.T48           1    1    0    0    0    0    1
>> AP_78_AP2.T48           1    1    0    0    0    0    1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>>
>>
>> -----Oprindelig meddelelse-----
>> Fra: Mark Robinson [mailto:mrobinson at wehi.EDU.AU]
>> Sendt: 18. november 2010 05:51
>> Til: Jakob Hedegaard
>> Cc: bioc-sig-sequencing at r-project.org
>> Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?
>>
>> Hi Jakob.
>>
>> Basically, you can analyze your time course data with edgeR through glmFit() as you have already done, but by choosing an appropriate design matrix that will allow you to test your contrasts of interest.
>>
>> For example, you might define a couple factors (please double check that this actually matches your $samples element of the DGEList object):
>>
>> S <- factor(rep( c("Acon","AP6","AP2"),
>>               c(6,5+6+6+5,5+6+5+4)))
>> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
>>               c(6,5,6,6,5,5,6,5,4)))
>>
>> ... and then a design matrix:
>>
>> design <- model.matrix( ~ S + T )
>>
>> So, this design matrix will look like this:
>>
>>
>>> design <- model.matrix( ~ S + T )
>>> design
>> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> 1            1    0    0    0    0    0    0
>> 2            1    0    0    0    0    0    0
>> 3            1    0    0    0    0    0    0
>> 4            1    0    0    0    0    0    0
>> 5            1    0    0    0    0    0    0
>> 6            1    0    0    0    0    0    0
>> 7            1    0    1    1    0    0    0
>> 8            1    0    1    1    0    0    0
>> 9            1    0    1    1    0    0    0
>> 10           1    0    1    1    0    0    0
>> 11           1    0    1    1    0    0    0
>> 12           1    0    1    0    1    0    0
>> 13           1    0    1    0    1    0    0
>> 14           1    0    1    0    1    0    0
>> 15           1    0    1    0    1    0    0
>> 16           1    0    1    0    1    0    0
>> 17           1    0    1    0    1    0    0
>> 18           1    0    1    0    0    1    0
>> 19           1    0    1    0    0    1    0
>> 20           1    0    1    0    0    1    0
>> 21           1    0    1    0    0    1    0
>> 22           1    0    1    0    0    1    0
>> 23           1    0    1    0    0    1    0
>> 24           1    0    1    0    0    0    1
>> 25           1    0    1    0    0    0    1
>> 26           1    0    1    0    0    0    1
>> 27           1    0    1    0    0    0    1
>> 28           1    0    1    0    0    0    1
>> 29           1    1    0    1    0    0    0
>> 30           1    1    0    1    0    0    0
>> 31           1    1    0    1    0    0    0
>> 32           1    1    0    1    0    0    0
>> 33           1    1    0    1    0    0    0
>> 34           1    1    0    0    1    0    0
>> 35           1    1    0    0    1    0    0
>> 36           1    1    0    0    1    0    0
>> 37           1    1    0    0    1    0    0
>> 38           1    1    0    0    1    0    0
>> 39           1    1    0    0    1    0    0
>> 40           1    1    0    0    0    1    0
>> 41           1    1    0    0    0    1    0
>> 42           1    1    0    0    0    1    0
>> 43           1    1    0    0    0    1    0
>> 44           1    1    0    0    0    1    0
>> 45           1    1    0    0    0    0    1
>> 46           1    1    0    0    0    0    1
>> 47           1    1    0    0    0    0    1
>> 48           1    1    0    0    0    0    1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>>
>>
>> Note that I've intentionally changed "con" to "Acon" in order to make it the control group ... so, the intercept term above represents the abundance parameter for the control/T00 samples.  Then, you can select your comparison of interest through the 'coef' argument of the glmLRT() function.  Note that you can select multiple columns.
>>
>> By judicious choice of the design matrix, you should be able to fit any contrast of interest. Currently you will probably need to redefine the design matrix for a different reference sample to obtain different contrasts.  We will be adding a 'contrasts' argument to glmLRT() shortly, which will make it much easier to investigate different contrasts of interest.
>>
>> Hope that helps.
>>
>> Cheers,
>> Mark
>>
>>
>> On 2010-11-18, at 1:32 AM, Jakob Hedegaard wrote:
>>
>>> Hi,
>>>
>>> I wonder how to analyze a RNA-Seq dataset of a time-series experiment.
>>>
>>> The dataset origin from a challenge experiment with 4-6 samples per time point (T00,T06,T12,T24 and T48) from two series of challenge (with one of two different serotypes). In total 48 individual samples and two factors, time and serotype (see the table below for details)
>>> RNA-Seq profiles have been obtained using Illumina GAIIx with multiplexing.
>>>
>>> I have used edgeR (the Cox-Reid and GLM method) to obtain the genes significantly affected to each time point relative to time zero (e.g. T12-T00) - thus initially ignoring the potential difference between the serotypes.
>>>
>>> But how can I obtain the genes significantly affected across the different contrastes?
>>>
>>>
>>>      group lib.size norm.factors serotype time
>>> AP_01 control  3734226    1.0575860      con  T00
>>> AP_02 control  4528260    1.0581673      con  T00
>>> AP_03 control  3648163    1.0594602      con  T00
>>> AP_28 control  4671178    1.0430147      con  T00
>>> AP_29 control  3746020    1.0528085      con  T00
>>> AP_30 control  4471915    1.1277386      con  T00
>>> AP_31 AP6.T06  7384334    0.9187757      AP6  T06
>>> AP_32 AP6.T06  3649621    0.9035999      AP6  T06
>>> AP_33 AP6.T06  5644324    0.8929802      AP6  T06
>>> AP_34 AP6.T06  3791540    0.9396600      AP6  T06
>>> AP_36 AP6.T06  3922243    0.8524249      AP6  T06
>>> AP_37 AP6.T12  3113854    1.0491183      AP6  T12
>>> AP_38 AP6.T12  2153867    1.0996506      AP6  T12
>>> AP_39 AP6.T12  2979503    1.0905274      AP6  T12
>>> AP_40 AP6.T12  5375493    1.0420513      AP6  T12
>>> AP_41 AP6.T12  3769654    0.9094147      AP6  T12
>>> AP_42 AP6.T12  2621303    1.1272673      AP6  T12
>>> AP_43 AP6.T24  3537847    1.0037276      AP6  T24
>>> AP_44 AP6.T24  3660552    1.0757808      AP6  T24
>>> AP_45 AP6.T24  3284038    1.0701603      AP6  T24
>>> AP_46 AP6.T24  3250374    1.0230341      AP6  T24
>>> AP_47 AP6.T24  7208387    0.9535068      AP6  T24
>>> AP_48 AP6.T24  4169075    0.9730747      AP6  T24
>>> AP_49 AP6.T48  5989902    0.9825794      AP6  T48
>>> AP_50 AP6.T48  3529028    0.9471979      AP6  T48
>>> AP_51 AP6.T48  5104071    1.0772029      AP6  T48
>>> AP_53 AP6.T48  4823387    0.9710606      AP6  T48
>>> AP_54 AP6.T48  3788201    1.0976409      AP6  T48
>>> AP_55 AP2.T06  4919578    0.7872065      AP2  T06
>>> AP_56 AP2.T06  4580068    0.9533078      AP2  T06
>>> AP_57 AP2.T06  3908180    0.9193207      AP2  T06
>>> AP_58 AP2.T06  3466801    0.9887874      AP2  T06
>>> AP_59 AP2.T06  4267998    0.8769085      AP2  T06
>>> AP_60 AP2.T12  4533905    0.9058305      AP2  T12
>>> AP_62 AP2.T12  5906089    0.9352388      AP2  T12
>>> AP_63 AP2.T12  3676318    1.1260072      AP2  T12
>>> AP_64 AP2.T12  2206081    1.0579246      AP2  T12
>>> AP_65 AP2.T12  3955338    1.0221930      AP2  T12
>>> AP_66 AP2.T12  3775918    0.9300664      AP2  T12
>>> AP_67 AP2.T24  3853681    0.9659259      AP2  T24
>>> AP_69 AP2.T24  3761829    0.9592286      AP2  T24
>>> AP_70 AP2.T24  4263273    1.0742397      AP2  T24
>>> AP_71 AP2.T24  4736798    0.9864702      AP2  T24
>>> AP_72 AP2.T24  6387114    1.0462401      AP2  T24
>>> AP_73 AP2.T48  3389351    1.0303610      AP2  T48
>>> AP_75 AP2.T48  1489023    1.1863821      AP2  T48
>>> AP_76 AP2.T48  3588175    1.0250639      AP2  T48
>>> AP_78 AP2.T48  3848562    0.9855582      AP2  T48
>>>
>>>
>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252    LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C                    LC_TIME=Danish_Denmark.1252
>>>
>>> attached base packages:
>>> [1] grDevices datasets  splines   graphics  stats     tcltk     utils     methods   base
>>>
>>> other attached packages:
>>> [1] edgeR_1.8.2     svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3     survival_2.35-8
>>>
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.13.1  grid_2.12.0     lattice_0.19-13 limma_3.6.6     svMisc_0.9-60   tools_2.12.0
>>>
>>>
>>> Best regards
>>>
>>> Jakob Hedegaard
>>> Project scientist
>>>
>>> AARHUS UNIVERSITY
>>> Faculty of Agricultural Sciences
>>> Dept. of Genetics and Biotechnology
>>> Blichers Allé 20, P.O. BOX 50
>>> DK-8830 Tjele
>>> Denmark
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: mrobinson at wehi.edu.au
>> e: m.robinson at garvan.org.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:24}}
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioc-sig-sequencing mailing list