[R] comparing matched proportions using glm
Corry Gellatly
corry.gellatly at newcastle.ac.uk
Mon Oct 8 13:51:38 CEST 2007
Thanks very much for your reply Chuck, I have a quick follow up
question. You mention putting the data into a 2x2x3 for log-linear
model, however my lists have many more than 3 strata, actually
thousands. I am trying to work out whether the proportions in list 1
tend to be equal to the proportions in list 2, in a kind of matched
pairs proportional test. Is the log-linear approach possible with a
2x2x1000 table, for example? Or would it be better to pursue the glm
route, using the surrogate Poisson model, as you suggested?
Best regards,
Corry
>-----Original Message-----
>From: Charles C. Berry [mailto:cberry at tajo.ucsd.edu]
>Sent: 04 October 2007 21:47
>To: Corry Gellatly
>Cc: r-help at r-project.org
>Subject: Re: [R] comparing matched proportions using glm
>
>On Thu, 4 Oct 2007, Corry Gellatly wrote:
>
>>
>> Dear R users,
>>
>> Is it possible to use a generalized linear model to do a binomial
>> comparison of one list of proportions with a matched list of
>> proportions to test for a difference?
>>
>> So, for example:
>>
>> list 1 list 2
>>
>> a1 | b1 a2 | b2
>>
>> 3 | 4 7 | 9
>> 6 | 7 5 | 1
>> 9 | 1 3 | 1
>>
>>
>> I want to compare list 1 with list 2 and the samples are matched.
>
>
>Meaning that
>
> 3 4 7 9
>
>are the _counts_ in one stratum of three in all?
>
>And you have an hypothesis that claims the proportions are
>equal in each stratum??
>
>The obvious candidate for that setup is a log-linear model for
>the counts in a 2 by 2 by 3 table.
>
>See
>
> ?loglin
>
>and
>
> ?loglm (in MASS)
>
>and the refernces therein.
>
>You can do this type of work in glm() if you understand
>surrogate Poisson models as outlined in
>
>McCullagh P. and Nelder, J. A. (1989) Generalized Linear
>Models. London:
>Chapman and Hall.
>
>HTH,
>
>Chuck
>
>> Obviously, I could add the columns and do a binomial test, i.e.
>> prop.test(c(18,15),c(30,26)), however, I have a large
>dataset so this
>> would reduce the power of my analysis. I could compare the
>ratios, i.e.
>> a1/(a1+b1) compared to a2/(a2+b2) for the samples in each list,
>> however, this does not account for the difference in sample sizes
>> between samples in each list.
>>
>> I have tried a glm where I bind a2 and b2 as the y variable, i.e.
>> y<-cbind(a2,b2) and also bind a1 and b1 as the x variable, i.e.
>> y<-cbind(a1,b1) and run <-glm(y~x,binomial)
>>
>> I get this type of output:
>>
>> Call:
>> glm(formula = y ~ x, family = binomial)
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -3.20426 -0.72686 -0.01822 0.68320 4.05035
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) 0.178369 0.186421 0.957 0.339
>> xa1 0.008109 0.017430 0.465 0.642
>> xb1 -0.026666 0.018153 -1.469 0.142
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>> Null deviance: 565.14 on 467 degrees of freedom
>> Residual deviance: 559.69 on 465 degrees of freedom
>> AIC: 1883.3
>>
>> Number of Fisher Scoring iterations: 3
>>
>>
>> Is this output meaningful? It seems that y is not compared directly
>> with x, but rather compared with a1 and b1, which is not intended?
>>
>> I wonder if this is a suitable approach to the problem? I'll be very
>> grateful for any advice or suggestions.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>Charles C. Berry (858) 534-2098
> Dept of
>Family/Preventive Medicine
>E mailto:cberry at tajo.ucsd.edu UC San Diego
>http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San
>Diego 92093-0901
>
>
>
More information about the R-help
mailing list