[R] fitted.values from zeroinfl (pscl package)
Achim Zeileis
Achim.Zeileis at wu-wien.ac.at
Mon Feb 18 14:40:07 CET 2008
On Mon, 18 Feb 2008, Sarah J Thomas wrote:
> Hello all:
>
> I have a question regarding the fitted.values returned from the
> zeroinfl() function. The values seem to be nearly identical to those
> fitted.values returned by the ordinary glm(). Why is this, shouldn't
> they be more "zero-inflated"?
>
> I construct a zero-inflated series of counts, called Y, like so:
To make this reproducible, I set the random seed to
set.seed(123)
in advance and then ran your source code
b= as.vector(c(1.5, -2))
g= as.vector(c(-3, 1))
x <- runif(100) # x is the covariate
X <- cbind(1,x)
p <- exp(X%*%g)/(1+exp(X%*%g))
m <- exp(X%*%b) # log-link for the mean process
# of the Poisson
Y <- rep(0, 100)
u <- runif(100)
for(i in 1:100) {
if( u[i] < p[i] ) { Y[i] = 0 }
else { Y[i] <- rpois(1, m[i]) }
}
# now let's compare the fitted.values from zeroinfl()
# and from glm()
z1 <- glm(Y ~ x, family=poisson)
z2 <- zeroinfl(Y ~ x|x) #poisson is the default
[snip]
> You can see that they are almost identical... and the fitted.values from
> zeroinfl don't seem to be zero-inflated at all! What is going on?
Well, let's see how zero inflated your observations are:
R> sum(u < p)
[1] 2
Wow, two (!) observations that have been zero-inflated. Let's see how much
the probability for observing a zero would have been
R> dpois(0, m[u < p])
[1] 0.3147816 0.1409670
which is not so low, in particular for the first one.
Overall, you've got
R> sum(Y < 1)
[1] 23
zeros in that data set and the expected number of zeros in a Poisson GLM
is
R> sum(dpois(0, fitted(z1)))
[1] 23.35615
So you have observed *less* zeros than expected by a Poisson GLM. Surely,
this is not the kind of data that zero-inflated models have been developed
for.
> Ultimately I want these fitted.values for a goodness of fit type of test
> to see if the zeroinfl model is needed or not for a given data series.
> With these fitted.values as they are, I am rejecting assumption of a
> zero-inflated model even when the data really are zero-inflated.
Maybe you ought to think about useful data-generating processes first
before designing tests or criticizing software...
Z
More information about the R-help
mailing list