[R] fitted.values from zeroinfl (pscl package)

Mon Feb 18 14:40:07 CET 2008

On Mon, 18 Feb 2008, Sarah J Thomas wrote:

> Hello all:
>
> I have a question regarding the fitted.values returned from the
> zeroinfl() function. The values seem to be nearly identical to those
> fitted.values returned by the ordinary glm(). Why is this, shouldn't
> they be more "zero-inflated"?
>
> I construct a zero-inflated series of counts, called Y, like so:

To make this reproducible, I set the random seed to

set.seed(123)

in advance and then ran your source code

b= as.vector(c(1.5, -2))
g= as.vector(c(-3, 1))
x <- runif(100) # x is the covariate
X <- cbind(1,x)

p <- exp(X%*%g)/(1+exp(X%*%g))
m <- exp(X%*%b)   # log-link for the mean process
                  # of the Poisson
Y <- rep(0, 100)

u <- runif(100)
for(i in 1:100) {
    if( u[i] < p[i] ) { Y[i] = 0 }
    else { Y[i] <- rpois(1, m[i]) }
}

# now let's compare the fitted.values from zeroinfl()
# and from glm()

z1 <- glm(Y ~ x, family=poisson)
z2 <- zeroinfl(Y ~ x|x) #poisson is the default

[snip]

> You can see that they are almost identical... and the fitted.values from
> zeroinfl don't seem to be zero-inflated at all! What is going on?

Well, let's see how zero inflated your observations are:

R> sum(u < p)
[1] 2

Wow, two (!) observations that have been zero-inflated. Let's see how much
the probability for observing a zero would have been

R> dpois(0, m[u < p])
[1] 0.3147816 0.1409670

which is not so low, in particular for the first one.

Overall, you've got

R> sum(Y < 1)
[1] 23

zeros in that data set and the expected number of zeros in a Poisson GLM
is

R> sum(dpois(0, fitted(z1)))
[1] 23.35615

So you have observed *less* zeros than expected by a Poisson GLM. Surely,
this is not the kind of data that zero-inflated models have been developed
for.

> Ultimately I want these fitted.values for a goodness of fit type of test
> to see if the zeroinfl model is needed or not for a given data series.
> With these fitted.values as they are, I am rejecting assumption of a
> zero-inflated model even when the data really are zero-inflated.

Maybe you ought to think about useful data-generating processes first
before designing tests or criticizing software...
Z