[R] How many samples ACTUALLY used in regression?
Federico Calboli
f.calboli at imperial.ac.uk
Mon Mar 18 16:18:16 CET 2013
On 18 Mar 2013, at 15:07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On 18/03/2013 14:51, Cade, Brian wrote:
>> Perhaps a crude but reliable way is to check the number of residuals, e.g.,
>> length(my.model$resid).
>
> Not very reliable (what about zero weights, for example?), and the component is usually 'residuals'.
>
> No one has so far mentioned nobs(), which seems to me to be the closest.
Given a my.data where 3 out of 100 rows will be discarded due to NAs
test = lm(formula = y ~ x + w, my.data, model = T)
nobs(test)
[1] 97 # as expected
But if I substitute 1 NA in one of the row of the housing data:
house.plr = polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)
nobs(house.plr)
[1] 1661
because of weights (which would not be take into account in a glm() fit).
Because I only care about the raw number of observations, is there a (hopefully) trivial way of getting nobs(poor.fit) to behave like a nobs(vlm.fit)?
BW
Federico
>
>> Brian
>>
>> Brian S. Cade, PhD
>>
>> U. S. Geological Survey
>> Fort Collins Science Center
>> 2150 Centre Ave., Bldg. C
>> Fort Collins, CO 80526-8818
>>
>> email: cadeb at usgs.gov <brian_cade at usgs.gov>
>> tel: 970 226-9326
>>
>>
>>
>> On Mon, Mar 18, 2013 at 8:39 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
>>
>>>
>>> On Mar 18, 2013, at 7:36 AM, Federico Calboli <f.calboli at imperial.ac.uk>
>>> wrote:
>>>
>>>> Dear All,
>>>>
>>>> is there a simple way that covers all regression models to extract the
>>> number of samples from a data frame/matrix actually used in a regression
>>> model?
>>>>
>>>> For instance I might have a data of 100 rows and 4 colums (1 response +
>>> 3 explanatory variables). If 3 samples have one or more NAs in the
>>> explanatory variable columns these samples will be dropped in any model:
>>>>
>>>> my.model = lm(y ~ x + w + z, my.data)
>>>> my.model = glm(y ~ x + w + z, my.data, family = binomial)
>>>> my.model = polr(y ~ x + w + z, my.data)
>>>> …
>>>>
>>>> I don't seem to be able to find one single method that works in the
>>> exact same way -- irrespective of the model type -- to interrogate my.model
>>> to see how many samples of my.data were actually used. Is there such
>>> function or do I need to hack something together?
>>>>
>>>> Best wishes
>>>>
>>>> Federico
>>>
>>>
>>> I don't know that this would be universal to all possible R model
>>> implementations, but should work for those that at least abide by "certain
>>> standards"[1] relative to the internal use of ?model.frame.
>>>
>>> In the case where model functions use 'model = TRUE' as the default in
>>> their call (eg. lm(), glm() and MASS::polr()), the returned model object
>>> will have a component called 'model', such that:
>>>
>>> nrow(my.model$model)
>>>
>>> returns the number of rows in the internally created data frame.
>>>
>>> Note that 'model = TRUE' is not the default for many functions, for
>>> example Terry's coxph() in survival or Frank's lrm() in rms.
>>>
>>> Note also that the value of 'na.action' in the modeling function call may
>>> have a potential effect on whether the number of rows in the retained
>>> 'model' data frame is really the correct value.
>>>
>>> You can also use model.frame(), independently matching arguments passed to
>>> the model function, to replicate what takes place internally in many
>>> modeling functions. The result of model.frame() will be a data frame,
>>> again, subject to similar limitations as above.
>>>
>>> Regards,
>>>
>>> Marc Schwartz
>>>
>>> [1]: http://developer.r-project.org/model-fitting-functions.txt
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list