[R] problems with glm
Dimitris Rizopoulos
dimitris.rizopoulos at med.kuleuven.be
Tue Oct 2 09:13:46 CEST 2007
you could also give a try to the following piece of code:
form$finished <- factor(form$finished)
glmFit <- glm(finished ~ ., family = binomial, data =
form[1:150000, ])
preds <- predict(glmFit, newdata = form[150001:200000, ], type =
"response")
Note also the following:
* since you supply the `data' argument of glm() you do not need to
specify the `formula' argument as "data$y ~ data$x", just use "y ~ x",
etc.
* for predict.glm() the argument is `newdata' not `data', and also
that `type = "response"' gives you the predicted probabilities; look
at ?predict.glm() for more info.
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: <stephenc at ics.mq.edu.au>
To: <r-help at stat.math.ethz.ch>
Sent: Tuesday, October 02, 2007 5:34 AM
Subject: [R] problems with glm
>I am having a couple of problems someone may be able to cast some
>light on.
>
>
> Question 1:
>
> I am making a logistic model but when i do this:
>
> glm.model = glm(as.factor(form$finished) ~ ., family=binomial,
> data=form[1:150000,])
>
> I get this:
>
>
> Error in model.frame(formula, rownames, variables, varnames, extras,
> extranames, :
> variable lengths differ (found for 'barrier')
>
>
> which is very strange because when I name the predictive factors
> like this:
>
> glm.model = glm(as.factor(form$finished) ~ form$first + form$second
> +
> form$third + form$barrier, family=binomial, data=form[1:150000,])
>
> It produces a model:
>
> Call:
> glm(formula = as.factor(form$finished) ~ form$first + form$second +
> form$third + form$barrier, family = binomial, data =
> form[1:150000,
> ])
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -3.0884 -0.4932 -0.3951 -0.3006 2.7135
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -2.957831 0.021446 -137.920 < 2e-16 ***
> form$first 0.624463 0.078036 8.002 1.22e-15 ***
> form$second 0.754057 0.080787 9.334 < 2e-16 ***
> form$third 7.718261 0.078532 98.281 < 2e-16 ***
> form$barrier -0.058185 0.002175 -26.751 < 2e-16 ***
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 144850 on 215213 degrees of freedom
> Residual deviance: 133292 on 215209 degrees of freedom
> AIC: 133302
>
> Number of Fisher Scoring iterations: 5
>
> Any idea why the first glm call doesn;t work?
>
> Second Question:
>
> Now I want to predict so i do this:
>
> pred <- predict(glm.model,data=form[150001:20000,],type="response")
>
> but when I try to use it I get this:
>
>> pred <-
>> predict(glm.model,data=form[150001:200000,],type="response")
>> t = table(pred,form$finished[150001:200000])
> Error in table(pred, form$finished[150001:2e+05]) :
> all arguments must have the same length
>
> and when I do this it confirms my pred is not 50000 long!
>
>> length(pred)
> [1] 215214
>
> It doesn't look as though my slection of rows has worked at all.
> Anyone
> know why?
>
> Stephen
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
More information about the R-help
mailing list