[R] Understanding lm-based analysis of fractional factorial experiments
Kjetil Kjernsmo
kjekje at ifi.uio.no
Wed Mar 6 11:46:16 CET 2013
All,
I have just returned to R after a decade of absence, and it is good to
see that R has become such a great success! I'm trying to bring Design
of Experiments into some aspects of software performance evaluation, and
to teach myself that, I picked up "Experiments: Planning, Analysis and
Optimization" by Wu and Hamada. I try to reproduce an analysis in the
book using lm, but have to conclude I don't understand what lm does in
this context, even though I end up at the desired result. I'm currently
using R 2.15.2 on a recent Fedora system, but I get the same result on
Debian Wheezy and Debian Squeeze. I think the discussion below can be
followed without having the book at hand though.
I'm working with tables 5.2 and 5.5 in the above mentioned book. Table
5.2 contains data from the "Leaf spring experiment". The dataset is also
in this zip file:
ftp://ftp.wiley.com/public/sci_tech_med/experiments-planning/data%20sets.zip
I've learned from the book that the effects can be found using a linear
model and double the coefficients. So, I do
> leaf <-
read.table("/ifi/bifrost/a03/kjekje/fag/experimental-planning/book-datasets/LeafSpring
table 5.2.dat", col.names=c("B", "C", "D", "E", "Q", paste("r", 1:3,
sep=""), "yavg", "ssq", "lnssq"))
> leaf.lm <- lm(yavg ~ B * C * D * E * Q, data=leaf)
> leaf.lm
Call:
lm(formula = yavg ~ B * C * D * E * Q, data = leaf)
Coefficients:
(Intercept) B+ C+ D+
E+
7.54000 0.07003 0.32333 -0.09668
0.07668
Q+ B+:C+ B+:D+ C+:D+
B+:E+
-0.33670 0.01335 0.11995 0.02335
NA
C+:E+ D+:E+ B+:Q+ C+:Q+
D+:Q+
NA NA 0.22915 -0.25745
0.28255
E+:Q+ B+:C+:D+ B+:C+:E+ B+:D+:E+
C+:D+:E+
0.05415 NA NA NA
NA
B+:C+:Q+ B+:D+:Q+ C+:D+:Q+ B+:E+:Q+
C+:E+:Q+
0.04160 -0.16160 -0.18840 NA
NA
D+:E+:Q+ B+:C+:D+:E+ B+:C+:D+:Q+ B+:C+:E+:Q+
B+:D+:E+:Q+
NA NA NA NA
NA
C+:D+:E+:Q+ B+:C+:D+:E+:Q+
NA NA
(seems there is little I can do about the line breaks here, sorry)
However, the book (table 5.5), has 0.221 for the main effect of B and
0.176, and the above is neither this, nor half of it. Now, I can
reproduce what's in the book with
> lm(yavg ~ B, data=leaf)
Call:
lm(formula = yavg ~ B, data = leaf)
Coefficients:
(Intercept) B+
7.5254 0.2213
> lm(yavg ~ C, data=leaf)
Call:
lm(formula = yavg ~ C, data = leaf)
Coefficients:
(Intercept) C+
7.5479 0.1763
Assuming lm does in fact double the coefficient in this case, but here
the intercept varies, which doesn't seem correct, nor can I as trivially
find the interactions the same way.
Now, I try the effects() function, and get familiar numbers:
> effects(leaf.lm)
(Intercept) B+ C+ D+ E+ Q+
-30.54415 -0.44250 0.35250 -0.05750 -0.20750 -0.51920
B+:C+ B+:D+ C+:D+ B+:Q+ C+:Q+ D+:Q+
-0.03415 -0.03915 0.07085 -0.16915 0.33085 -0.10755
E+:Q+ B+:C+:Q+ B+:D+:Q+ C+:D+:Q+
0.05415 -0.02080 0.08080 -0.09420
and indeed, I have verified that effects(leaf.lm)/2 gives me the
expected result.
So, I have found the correct answer, but I don't understand why. I have
read the documentation for effects() as well as looked through the
relevant chapter in "Statistical Models in S", but from that all I got
was that I suppose there is a hint in the phrase "the effects are the
uncorrelated single-degree-of-freedom", and that is somewhat different
from the coefficients, but I can't make out from the book (Wu & Hamada)
why the coefficients should be any different than the effects, to the
contrary, it is quite clear from equation (5.8) in the book that the
coefficients they use are effects(leaf.lm)/4.
So, there are at least two points of confusion here, one is how coef()
differs from effects() in the case of fractional factorial experiments,
and the other is the factor 1/4 between the coefficients used by Wu &
Hamada and the values returned by effects() as I would think from theory
I've read that it should be a factor 2.
Best regards,
Kjetil
--
Kjetil Kjernsmo
PhD Research Fellow, University of Oslo, Norway
Semantic Web / SPARQL Query Federation
kjekje at ifi.uio.no http://www.kjetil.kjernsmo.net/
More information about the R-help
mailing list