[BioC] Problem fitting a linear model using limma
James W. MacDonald
jmacdon at uw.edu
Wed Mar 7 18:01:06 CET 2012
Hi Joaquin,
On 3/7/2012 11:08 AM, Joaquin Martinez wrote:
> Dear All,
>
> My colleague Ben and I have fitted a linear model to microarray data using
> the limma package but we get "coefficients not estimable" for the last two
> columns in the design matrix during the fit. As a consequence when trying
> to fit our contrast matrix we get the following error message, "Error in
> contrasts.fit(fit, contrasts = A) : trying to take contrast of
> non-estimable coefficient." We have discovered that if we simply reorder
> the design matrix columns and contrasts matrix rows, we encounter the same
> error. We would very much appreciate if someone could explain to us why the
> not-estimable coefficients arise, and how to correct the problem.
>
> We are following the guide (section 8.2) for limma 3.10.0 using this
> version of R 2.14.0 (sessionInfo at end of email). We have 6 different
> types of samples (Puninf, P86, P163, Muninf, M86, M163), and 3 biological
> replicates of each (1,2,3; 4,5,6; 7,8,9; 10,11,12; 13,14,15; 16,17,18).
Yes, but your targets file has numbers appended to the sample types,
which makes R think they are different. Hypothetically you would have
noticed this when using the modelMatrix() function:
> modelMatrix(targets, ref = "Puninf.1")
Found unique target names:
M163.16 M163.17 M163.18 M86.13 M86.14 M86.15 Muninf.10 Muninf.11
Muninf.12 P163.7 P163.8 P163.9 P86.4 P86.5 P86.6 Puninf.1 Puninf.2 Puninf.3
This indicates that limma thinks all of your replicates are different
sample types. This won't work out (as you have already found) because
the model will be over specified, which simply means that you cannot
estimate all the parameters you have in the model. This is the same as
saying you can't determine the values of x and y in the formula x + y =
3 (but if you also know that x - y = -1, then you can solve for both x
and y).
Now back to the problem at hand. If you remove the numbers from your
sample types,
targets$Cy3 <- sapply(strsplit(targets$Cy3, "\\."), function(x) x[1])
targets$Cy5 <- sapply(strsplit(targets$Cy5, "\\."), function(x) x[1])
then you will create the correct model matrix
> modelMatrix(targets, ref="Puninf")
Found unique target names:
M163 M86 Muninf P163 P86 Puninf
M163 M86 Muninf P163 P86
[1,] 0 0 1 0 0
[2,] 0 0 0 0 1
[3,] 0 0 0 -1 0
[4,] 0 -1 1 0 0
[5,] 1 0 -1 0 0
[6,] 0 1 0 0 -1
[7,] -1 0 0 1 0
[8,] 0 0 -1 0 0
[9,] 0 0 0 0 -1
[10,] 0 0 0 1 0
[11,] 0 1 -1 0 0
[12,] -1 0 1 0 0
[13,] 0 -1 0 0 1
[14,] 1 0 0 -1 0
[15,] 0 0 1 0 0
[16,] 0 0 0 0 1
[17,] 0 0 0 -1 0
[18,] 0 -1 1 0 0
[19,] 1 0 -1 0 0
[20,] 0 1 0 0 -1
[21,] -1 0 0 1 0
Which will compare all samples to Puninf.
Best,
Jim
>
> # The targets file looks like this
>
> targets<- structure(list(SlideNumber = c(29L, 30L, 31L, 32L, 33L, 34L,
> 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 51L, 46L, 47L,
> 48L, 50L), FileName = c("BE T22h slide 29", "BE T22h slide 30",
> "BE T22h slide 31", "BE T22h slide 32", "BE T22h slide 33", "BE T22h
> slide 34",
> "BE T22h slide 35", "BE T22h slide 36", "BE T22h slide 37", "BE T22h
> slide 38",
> "BE T22h slide 39", "BE T22h slide 40", "BE T22h slide 41", "BE T22h
> slide 42",
> "BE T22h slide 43", "BE T22h slide 44", "BE T22h slide 51", "BE T22h
> slide 46",
> "BE T22h slide 47", "BE T22h slide 48", "BE T22h slide 50"),
> Cy3 = c("Puninf.1", "Puninf.1", "P163.7", "M86.13", "Muninf.10",
> "P86.4", "M163.16", "Muninf.11", "P86.5", "Puninf.2", "Muninf.11",
> "M163.17", "M86.14", "P163.8", "Puninf.3", "Puninf.3", "P163.9",
> "M86.15", "Muninf.12", "P86.6", "M163.18"), Cy5 = c("Muninf.10",
> "P86.4", "Puninf.1", "Muninf.10", "M163.16", "M86.13", "P163.7",
> "Puninf.2", "Puninf.2", "P163.8", "M86.14", "Muninf.11",
> "P86.5", "M163.17", "Muninf.12", "P86.6", "Puninf.3", "Muninf.12",
> "M163.18", "M86.15", "P163.9")), .Names = c("SlideNumber",
> "FileName", "Cy3", "Cy5"), class = "data.frame", row.names = c(NA,
> -21L))
>
> # for readability we also paste it in here...
> #> targets
> # SlideNumber FileName Cy3 Cy5
> # 1 29 BE T22h slide 29 Puninf.1 Muninf.10
> # 2 30 BE T22h slide 30 Puninf.1 P86.4
> # 3 31 BE T22h slide 31 P163.7 Puninf.1
> # 4 32 BE T22h slide 32 M86.13 Muninf.10
> # 5 33 BE T22h slide 33 Muninf.10 M163.16
> # 6 34 BE T22h slide 34 P86.4 M86.13
> # 7 35 BE T22h slide 35 M163.16 P163.7
> # 8 36 BE T22h slide 36 Muninf.11 Puninf.2
> # 9 37 BE T22h slide 37 P86.5 Puninf.2
> # 10 38 BE T22h slide 38 Puninf.2 P163.8
> # 11 39 BE T22h slide 39 Muninf.11 M86.14
> # 12 40 BE T22h slide 40 M163.17 Muninf.11
> # 13 41 BE T22h slide 41 M86.14 P86.5
> # 14 42 BE T22h slide 42 P163.8 M163.17
> # 15 43 BE T22h slide 43 Puninf.3 Muninf.12
> # 16 44 BE T22h slide 44 Puninf.3 P86.6
> # 17 51 BE T22h slide 51 P163.9 Puninf.3
> # 18 46 BE T22h slide 46 M86.15 Muninf.12
> # 19 47 BE T22h slide 47 Muninf.12 M163.18
> # 20 48 BE T22h slide 48 P86.6 M86.15
> # 21 50 BE T22h slide 50 M163.18 P163.9
>
> # The design matrix is computed like this design<- modelMatrix(targets,
> ref = "Puninf.1")
> # but here is the structure if it is helpful
> design<- structure(c(0, 0, 0, 0, 1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 1, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 1, 0, -1, 0, 0, 0, -1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, -1,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, -1, 0, 1, 0, 1, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, -1,
> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 1, 0, 0, 1, -1, 0, 0, 0, 0, -1, 0, 0, 0, 1, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
> 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 1, 0, 1, 0, 0, 0, -1, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> -1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0,
> 0, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 1, 0, 0, 0, 0), .Dim = c(21L,
> 17L), .Dimnames = list(c("slide 29", "slide 30", "slide 31",
> "slide 32", "slide 33", "slide 34", "slide 35", "slide 36", "slide 37",
> "slide 38", "slide 39", "slide 40", "slide 41", "slide 42", "slide 43",
> "slide 44", "slide 51", "slide 46", "slide 47", "slide 48", "slide 50"
> ), c("M163.16", "M163.17", "M163.18", "M86.13", "M86.14", "M86.15",
> "Muninf.10", "Muninf.11", "Muninf.12", "P163.7", "P163.8", "P163.9",
> "P86.4", "P86.5", "P86.6", "Puninf.2", "Puninf.3")))
>
>
> # Fit linear model - commented out becuase we can't really send all the
> data in MA
> # fit<- lmFit(MA, design = design)
> # Coefficients not estimable: Puninf.2 Puninf.3
>
> # Make a contrast matrix:
> A<-
> makeContrasts(PuninfvsMuninf=(Muninf.10+Muninf.11+Muninf.12-Puninf.2-Puninf.3)/3,
> levels = design)
>
> # and now fit the contrast
> con.fit<- contrasts.fit(fit, contrasts = A)
> Error in contrasts.fit(fit, contrasts = A) : trying to take contrast of
> non-estimable coefficient
>
> # now change the column order in the design matrix and row order of
> contrast matrix
> # we create a new sort index - arbitrarily placing the last two columns in
> the original
> # design matrix in the middle of the new one. Ditto for the rows of the
> contrast matrix.
> n<- nrow(A)
> newIndex<- c(1:5, n-1, n, 6:(n-2))
> designB<- design[,newIndex]
> B<- A[newIndex,]
> attributes(B)<- attributes(A)
>
> # create a new fit, but we get the same knd of error, the last two columns
> are
> # identified as non-estimable
> fitB<- lmFit(X$MA, design = designB)
> #Coefficients not estimable: P86.5 P86.6
> #Warning message:
> #Partial NA coefficients for 16278 probe(s)
>
>
> As far as we can tell, it's the trailing columns that get singled out, but
> we don't know why. Hopefully the information about is detailed enough, but
> we can provide more information if necessary to help troubleshooting.
>
> Thanks in advance,
>
> Joaquin
>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] limma_3.10.0
>
> loaded via a namespace (and not attached):
> [1] tools_2.14.0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list