[BioC] Affy data analysis

James W. MacDonald jmacdon at uw.edu
Mon Apr 16 15:49:10 CEST 2012


Hi Himanshu Sharma,

On 4/14/2012 6:42 PM, hsharm03 at students.poly.edu wrote:
> Dear all,I have data from affy HT430mgpm and I need to analyze the data for differential expression and pathway analysis. I have 3 wildtype controls (Wt neurospheres 2 and 3) for the control analysis.  I have two other tumors  (1509 and 1701) for the analysis. From the cel files, it doesn’t appear that we did replicates for the tumors, just one each, the rationale at the time being that we had wanted to first quickly scan the tumors for common signatures. Those genes that are clearly highly expressed should however represent additional oncogenic signatures, that may stem from the same or related activating pathways.For now, my analysis for controls should give me an accurate expression data for the controls. The tumors will have to be compared across the samples to look for the low hanging fruits.??I am not sure how do I go about doing this since I have 3 replicates for the control but 1 each for different tumors. What should be the strategy that I should use in order to do my analysis.

You can just analyze your data as indicated in the limma User's Guide. 
Note that although you only have one sample for each of the tumor 
samples, since you have three replicates for the control you end up with 
2 degrees of freedom, so can actually fit a model and compute contrasts. 
Here is an example using some fake data:

 > x <- matrix(rnorm(5e5), ncol = 5)
 > design <- model.matrix(~factor(rep(1:3, c(3,1,1))))
 > fit <- lmFit(x, design)
 > fit2 <- eBayes(fit)
 > topTable(fit2, 2)
logFC t P.Value adj.P.Val B
27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008
98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736
90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717
41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015
43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141
46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574
37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841
98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714
46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596
72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809
 > topTable(fit2, 3)
logFC t P.Value adj.P.Val B
19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077
883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617
87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452
76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249
50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920
63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524
783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235
59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398
92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357
22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741

As you can see, limma is happy to run the analysis without any 
replication for two of the sample types.

Best,

Jim


> Thanks,Himanshu Sharma 		 	   		
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list