[BioC] Very low P-values in limma
Wolfgang Huber
whuber at embl.de
Thu Oct 22 15:22:32 CEST 2009
Hi Paul
how does
hist(topTable(ebayes, coef=2, adjust="BH", n=nrow(ebayes))$P.Value,
breaks=100)
look like?
Best wishes
Wolfgang
Paul Geeleher ha scritto:
> Hi Wolfgang,
>
> Thanks alot for your reply, I removed the offending line and the
> p-values are still much the same. Actually the reason I had that line in
> was that I generated a plot (PCA I think) that wouldn't accept negative
> values.
>
> -Paul.
>
>
>
> On Thu, Oct 22, 2009 at 1:21 PM, Wolfgang Huber <whuber at embl.de
> <mailto:whuber at embl.de>> wrote:
>
> Dear Paul
>
> please do not do something like:
>
>
> # this line of code will remove any negative values which cause errors
> # further on
> RG$G[RG$G<0] <- 0
>
> First, vsn is intended to work with data that is not mutilated like
> that. There is ample literature on this topic. The vsn vignette is a
> good start.
>
> Second, when you use a parametric test (which is what
> lmFit/eBayes/topTable do), do not arbitrarily replace parts of your
> data with phantasy values, it can invalidate the assumptions
> underlying the p-value computation.
>
> I am not sure whether this already fixes all your problems, but this
> is certainly one of them.
>
> Best wishes
> Wolfgang
>
>
>
> Paul Geeleher ha scritto:
>
> The value of df.prior is 3.208318. Hopefully somebody can help
> me here
> because I'm still really at a loss on why these values are so low.
>
> Here's the script. I'm fairly sure it conforms to the documentation:
>
> library(affy)
>
> setwd('/home/paul/PhD_Stuff/miRNA_guilleme_project/HERArrays/')
>
> # create a color pallete for boxplots
> cols <- brewer.pal(12, "Set3")
>
> # This defines the column name of the mean Cy3 foreground intensites
> #Cy3 <- "F532 Median"
> # background corrected value
> Cy3 <- "F532 Median - B532"
>
> # This defines the column name of the mean Cy3 background intensites
> Cy3b <- "B532 Mean"
>
> # Read the targets file (see limmaUsersGuide for info on how to
> create this)
> targets <- readTargets("targets.csv")
>
> # read values from gpr file
> # because there is no red channel, read Rb & R to be the same
> values as G
> and Gb
> # this is a type of hack that allows the function to work
> RG <- read.maimages( targets$FileName,
> source="genepix",columns=list(R=Cy3,G=Cy3, Rb=Cy3b, Gb=Cy3b))
>
> # remove the extraneous values red channel values
> RG$R <- NULL
> RG$Rb <- NULL
>
> # this line of code will remove any negative values which cause
> errors
> further on
> RG$G[RG$G<0] <- 0
>
> # create a pData for the data just read
> # this indicates which population each array belongs to
> # a are "her-" and b are "her+" (almost certain)
> pData <- data.frame(population = c('a', 'a', 'a', 'a', 'b', 'b',
> 'b'))
> rownames(pData) <- RG$targets$FileName
>
> # create design matrix
> design <- model.matrix(~factor(pData$population))
>
> # In my .gpr files all miRNAs contain the string "-miR-" or
> "-let-" in their
> name
> # so the grep function can be used to extract all of these, removing
> # all control signals and printing buffers etc.
> # You need to check your .gpr files to find which signals you should
> extract.
> miRs <- c(grep("-miR-", RG$genes$Name), grep("-let-",
> RG$genes$Name))
> RG.final <- RG[miRs, ]
>
> # load vsn library, this contains the normalization functions
> library('vsn')
>
> # Do VSN normalization and output as vns object
> mat <- vsnMatrix(RG.final$G)
>
> # insert rownames into matrix with normalized data
> # this will mean that the gene names will appear in our final output
> rownames(mat at hx) <- RG.final$genes$Name
>
> # my .gpr files contain 4 "within-array replicates" of each miRNA.
> # We need to make full use of this information by calculating
> the duplicate
> correlation
> # in order to use the duplicateCorrelation() function below,
> # we need to make sure that the replicates of each gene appear
> in sequence
> in this matrix,
> # so we sort the normalized values so the replicate groups of 4
> miRs appear
> in sequence
> mat at hx <- mat at hx[order(rownames(mat at hx)), ]
>
> # calculate duplicate correlation between the 4 replicates on
> each array
> corfit <- duplicateCorrelation(mat, design, ndups=4)
>
> # fit the linear model, including info on duplicates and correlation
> fit <- lmFit(mat, design, ndups=4, correlation=corfit$consensus)
>
> # calculate values using ebayes
> ebayes <- eBayes(fit)
>
> # output a list of top differnetially expressed genes...
> topTable(ebayes, coef = 2, adjust = "BH", n = 10)
>
>
> -Paul.
>
>
> On Thu, Oct 22, 2009 at 12:09 AM, Wei Shi <shi at wehi.edu.au
> <mailto:shi at wehi.edu.au>> wrote:
>
> Dear Paul:
>
> The low p-values you have got are not surprising to me. I
> have got even
> lower FDR adjusted p-values than yours. This just means you got
> significantly differentially expressed genes. But on the
> other hand, I did
> see much higher adjusted p-values in some of my microarray
> analyses. In that
> case, I will explore the data in more depth such as looking
> at batch effect
> etc.
>
> Cheers,
> Wei
>
>
> Paul Geeleher wrote:
>
> Hi folks, I'm analyzing microRNA data using limma and
> I'm wondering about
> the validity of the p-values I'm getting out. Its a
> simple 'Group A Vs
> Group
> B' experimental design. 4 arrays in one group, 3 in the
> other and 4
> duplicate spots for each miRNA on each array.
>
> The lowest adjusted p-values in the differential
> expression analysis are
> in
> the region of 10^-7.
>
> Its been pointed out to me that plugging the point
> values from each sample
> into a regular t-test you get p=0.008, which then also
> needs to take the
> multiple test hit. Can anybody explain why limma is
> giving me such lower
> values and if they are valid?
>
> I can provide more information if required.
>
> Thanks,
>
> Paul.
>
>
>
>
>
>
> --
>
> Best wishes
> Wolfgang
>
> -------------------------------------------------------
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
> -------------------------------------------------------
>
>
>
>
> --
> Paul Geeleher
> School of Mathematics, Statistics and Applied Mathematics
> National University of Ireland
> Galway
> Ireland
>
--
Best wishes
Wolfgang
-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioconductor
mailing list