[BioC] Using limma to identify differentially expressed genes
James W. MacDonald
jmacdon at uw.edu
Wed Apr 11 14:56:12 CEST 2012
Hi Avoks,
On 4/11/2012 7:28 AM, Ovokeraye Achinike-Oduaran wrote:
> Hi Jim,
>
> I digress a bit (sorry David). But I was looking at your code
> combining both getGEO and getGEOSuppFiles. The analysis you did I'm
> guessing is based on the raw files because you had to read in an
> affybatch object. I'm having a challenge with making my covdesc.txt
> file to work for me with read.affy(), so I'm wondering if your
> combination is a way to retrieve the phenotypic data without having to
> manually create the text file. In other words, my question is: does
> this combination of getGEO() and getGEOSuppFiles() make it possible to
> boycott the use of the manually created covdesc.txt file in
> read.affy()?
I assume that a covdesc.txt file is something you want to use for the
phenoData slot of your ExpressionSet? If so, note that you don't have to
explicitly create or use the phenoData slot; it is there in order to
make your ExpressionSet self-descriptive for others.
Unless you are planning to give your ExpressionSet to somebody else, I
don't see a pressing reason to ever bother with creating and using the
phenoData. You already know what is in there, and can easily create any
design matrices, etc for further analyses.
But to answer your question, I only used getGEO() in order to get the
phenoData so I could easily create a design matrix without having to
figure out which sample is which.
Best,
Jim
>
> Thanks.
>
> Avoks
>
>
> On Tue, Apr 10, 2012 at 5:14 PM, David Westergaard<david at harsk.dk> wrote:
>> Hello,
>>
>> I've been trying to use limma to identify genes from the following
>> data: http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-21340 -
>> It's a simple control vs. disease experiment
>>
>>
>> # SDRF downloaded from above page
>> SDRF<- read.table(file="E-GEOD-21340.sdrf.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
>>
>> # Looking to compare Family-history negativer versus Diabetis
>> controls<- SDRF[grep("Control, Family History
>> Neg",SDRF$Comment..Sample_source_name.),]
>> disease<- SDRF[grep("^DM",SDRF$Characteristics.DiseaseState.),]
>> Batch<- rbind(controls,disease)
>>
>> # Read in CEL files
>> mixture.batch<- ReadAffy(filenames=Batch$Array.Data.File)
>>
>> # Preprocess data
>> mixture.processed<- expresso(mixture.batch, bgcorrect.method = "rma",
>> normalize.method = "quantiles", pmcorrect.method = "pmonly",
>> summary.method = "medianpolish")
>>
>> # Get data in matrix
>> signals<- exprs(mixture.prepared)
>> cl<- ifelse(colnames(signals) %in% disease$Array.Data.File,1,0)
>>
>> # Do design matrix and fit
>> design<- model.matrix(~factor(cl))
>> fit<- lmFit(signals,design)
>> fit<- eBayes(fit)
>> topTable(fit2,coef=2)
>>
>> Which yields the following:
>> ID logFC AveExpr t P.Value adj.P.Val B
>> 7513 208004_at -0.323 5.43 -4.65 0.00191 0.999 -3.10
>> 11225 211829_s_at 0.340 5.07 4.36 0.00278 0.999 -3.17
>> 5950 206424_at -0.907 6.65 -4.15 0.00363 0.999 -3.23
>> 1354 201826_s_at -0.447 8.37 -4.13 0.00374 0.999 -3.24
>> 19782 220418_at 0.392 5.43 4.02 0.00431 0.999 -3.27
>> 8889 209396_s_at 1.899 7.47 4.01 0.00437 0.999 -3.28
>> 5005 205478_at -0.931 9.22 -3.94 0.00481 0.999 -3.30
>> 9469 209983_s_at 0.412 5.72 3.92 0.00492 0.999 -3.31
>> 2936 203409_at 0.506 6.93 3.87 0.00531 0.999 -3.32
>> 5054 205527_s_at 0.331 6.80 3.84 0.00549 0.999 -3.33
>>
>> I'm abit puzzled over the adjusted P-values. Can it really be true
>> that ALL of the adjusted P-values are 0.999, or did I make a rookie
>> mistake somewhere?
>>
>> Best Regards,
>> David Westergaard
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list