[R] overdispersion + GAM
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Tue Feb 12 09:05:50 CET 2008
Gavin Simpson wrote:
> On Mon, 2008-02-11 at 14:46 -0500, Ravi Varadhan wrote:
>
>> No. Binomial data can indeed be overdispersed. See McCullagh & Nelder
>> (1989, section 4.5). Accounting for over(under)dispersion in binomial and
>> Poisson distributions is, in fact, one of the original impetus for GEE type
>> developments. See also a nice paper by Liang & McCullagh (Biometrics 1993,
>> p. 623-630), which discusses numerous examples of overdispersion in binary
>> data.
>>
>> Ravi.
>>
>
> Hi Ravi,
>
> I was very careful to say "Bernoulli" rather than "binomial". I
> understand that overdispersion can be present in Poisson or binomial
> (M>1), hence the need for a quasibinomial family function. I was,
> however, always led to believe that overdispersion in binary data was
> not possible, and that was how I interpreted the OP's statement about
> presence/absence data.
>
>
Yep. A qualification that one should probably include is that it refers
to independently and identically sampled data. The point being that you
cannot have a distribution on {0, 1} where the variance is anything but
p(1-p) where p is the mean; if you put a distribution on p and integrate
it out, you still end up with the same variance.
Correlation structures can still be present and may lead to both over-
and underdispersion of the total counts or proportions. (E.g. the total
number of blacksmiths in olden days in a county would typically equal
the number of villages --- underdispersion, whereas group phenomena like
when either everyone or noone in a school class does something leads to
overdispersion of the overall proportion.)
> This appears to have been discussed recently on the R-help list:
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/91242.html is a reply to
> a posting by Peter Dalgaard (in response to an original question on
> R-help - apologies, I can't seem to get to the email archives on tolstoy
> to find the start of the thread).
>
> My response was in the same vein as Peter's ">> There is no such thing
> as overdispersion for binary data." (quoted from his response to the
> OP). To be fair (for those not going to look at the thread), Peter then
> follows this up later in the thread saying (in reply to John
> Maindonald's posting) "I don't really disagree, of course. I was mainly
> being provocative."
>
> The two messages from Peter and John in that thread are very
> interesting; I'm not sure I fully understand what they are going on
> about, but I get the gist.
>
> And of course, I would be more than happy to be corrected and pointed in
> the direction of something not too technical (I'm an ecologist, not a
> statistician or mathematician) that discusses this. To that end, I'll be
> hunting out McCullagh & Nelder and the Biometrics paper you cite, Ravi,
> but if you or anyone can point to other literature, I'd be most
> grateful.
>
> All the best,
>
> G
>
>
>> ----------------------------------------------------------------------------
>> -------
>>
>> Ravi Varadhan, Ph.D.
>>
>> Assistant Professor, The Center on Aging and Health
>>
>> Division of Geriatric Medicine and Gerontology
>>
>> Johns Hopkins University
>>
>> Ph: (410) 502-2619
>>
>> Fax: (410) 614-9625
>>
>> Email: rvaradhan at jhmi.edu
>>
>> Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
>>
>>
>>
>> ----------------------------------------------------------------------------
>> --------
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf Of Gavin Simpson
>> Sent: Monday, February 11, 2008 12:37 PM
>> To: anna banana
>> Cc: r-help at r-project.org
>> Subject: Re: [R] overdispersion + GAM
>>
>> On Mon, 2008-02-11 at 07:35 -0800, anna banana wrote:
>>
>>> Hi,
>>>
>>> there are a lot of messages dealing with overdispersion, but I couldn't
>>>
>> find
>>
>>> anything about how to test for overdispersion. I applied a GAM with
>>>
>> binomial
>>
>>> distribution on my presence/absence data, and would like to check for
>>> overdispersion. Does anyone know the command?
>>>
>> Bernoulli data (presence/absence of single species say) can't be
>> overdispersed, so there is no need to test or correct for it.
>>
>> G
>>
>>
>>> Many thanks,
>>>
>>> Anna
>>>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list