[R] Using na.omit() on the output of na.omit()

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Mar 31 19:41:27 CEST 2026


Às 00:27 de 31/03/2026, Shu Fai Cheung escreveu:
> Thanks for the comment. Just to share a little bit more. I used lm()
> in another post, but I will use model.frame here because it is
> model.frame that uses the na.action argument.
> 
> This is the example:
> 
> dat <- data.frame(
>    x = 1:5,
>    y = 1:5,
>    z = 1:5
> )
> dat[1, "x"] <- NA
> dat[2, c("y", "z")] <- NA
> dat_omitted <- na.omit(dat)
> head(dat)
> #>    x  y  z
> #> 1 NA  1  1
> #> 2  2 NA NA
> #> 3  3  3  3
> #> 4  4  4  4
> #> 5  5  5  5
> head(dat_omitted)
> #>   x y z
> #> 3 3 3 3
> #> 4 4 4 4
> #> 5 5 5 5
> 
> tmp1 <- model.frame(y ~ x + z, dat, na.action = na.omit)
> tmp2 <- model.frame(y ~ x + z, dat_omitted, na.action = na.omit)
> 
> na.action(tmp1)
> #> 1 2
> #> 1 2
> #> attr(,"class")
> #> [1] "omit"
> na.action(tmp2)
> #> NULL
> 
> tmp3 <- na.omit(dat)
> tmp4 <- na.omit(dat_omitted)
> 
> na.action(tmp3)
> #> 1 2
> #> 1 2
> #> attr(,"class")
> #> [1] "omit"
> na.action(tmp4)
> #> 1 2
> #> 1 2
> #> attr(,"class")
> #> [1] "omit"
> 
> This example has two data frames: dat with missing data, and
> dat_omitted with cases omitted by na.omit, with an na.action
> attribute.
> 
> When the input is dat_omitted, model.frame, which also stores cases
> removed, if any, does not report that cases were omitted by it.
> 
> For na.omit, on the the other hand, it will pass the original object
> (dat_omitted) unchanged, including the na.action attributes, if any.
> Therefore, when na.action is used on the output (tmp4), or the
> na.action attributes retrieved, there is no way to know whether the
> cases were omitted by this call of na.omit or by a previous call,
> unlike model.frame.
> 
> After studying this example, what puzzled me was no longer the
> behavior of na.omit, but the difference in the behaviors of these two
> functions when processing an object with the na.action attribute.
> 
> But I now think it was designed that way. I will do my work taking
> into account the current behavior of na.omit. I'm sharing this
> observation, which is new to me (and probably only me), in case it is
> useful to others.
> 
> Regards,
> Shu Fai
> 
> 
> On Mon, Mar 30, 2026 at 10:18 PM peter dalgaard <pdalgd using gmail.com> wrote:
>>
>> I'm fairly sure that this is deliberate, although hardly something that I bump into every day. na.omit & friends are designed to work in connection with modeling code and there may be code down the line that wants to know about omitted cases. So na.omit cleans the original data by removing some cases and attaching a list of omitted cases; if you do it again, you still have a dataframe and some omitted cases.
>>
>> If anything, I would contend that omissions should be cumulative, so if you (say) add a column with some missing values and na.omit again, then the omission list should be augmented rather than replaced.
>>
>> Also, from a pragmatic viewpoint, there is a chance that na.omit gets called repeatedly by coincidence in the labyrinth of interrelated lm functions.
>>
>> -pd
>>
>>> On 27 Mar 2026, at 15.13, Shu Fai Cheung <shufai.cheung using gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> I noticed a behavior of na.omit() that I am not sure is intended.
>>>
>>> This is adapted from the example of na.omit()
>>>
>>>> DF <- data.frame(x = c(1, 2, 3), y = c(NA, 10, 1))
>>>> DF_omitted <- na.omit(DF)
>>>> attr(DF_omitted, "na.action")
>>> 1
>>> 1
>>> attr(,"class")
>>> [1] "omit"
>>>> DF_omitted2 <- na.omit(DF_omitted)
>>>> attr(DF_omitted2, "na.action")
>>> 1
>>> 1
>>> attr(,"class")
>>> [1] "omit"
>>>
>>> In the first call to na.omit(), the output, DF_omitted, correctly has only
>>> two rows, with Row 1 removed, and 'na.action' stores the removed row number.
>>>
>>> In the second call of na.omit(), no cases are removed because DF_omitted
>>> has no missing data. However, the attribute 'na.action' is retained,
>>> indicating that Row 1 was removed.
>>>
>>> To my understanding of na.omit(), this occurred because, in the second call
>>> of na.omit(), no rows were omitted, and so the original object was
>>> returned, along with the attribute 'na.action' from the first call.
>>>
>>>  From the "perspective" of the second call, no rows were omitted.
>>> Should 'na.action'
>>> be NULL (i.e., not set), as in the following example?
>>>
>>>> DF2 <- data.frame(x = c(1, 2, 3), y = c(3, 10, 1))
>>>> DF2_omitted <- na.omit(DF2)
>>>> attr(DF2_omitted, "na.action")
>>> NULL
>>>
>>> Or is this behavior of na.omit() intended?
>>>
>>> Regards,
>>> Shu Fai
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com
>>
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Hello,

Let me try to explain the way I see it.

You cannot compare the output of na.omit with the output of model.frame, 
they are two completely different functions, with completely different 
use-cases and goals.
Both na.omit and model.frame are generic, so what you are calling in 
your examples are methods, na.omit.data.frame and model.frame.default.

Start with the later, model.frame.default.

 From the documentation, help("model.frame"), section Arguments, 
argument dots:

...	
for model.frame methods, a mix of further arguments such as data, 
na.action, subset to pass to the default method. Any additional 
arguments (such as offset and weights or other named arguments) which 
reach the default method are used to create further columns in the model 
frame, with parenthesised names such as "(offset)".



So, if you pass arguments like `offset` or `weights` they will be 
included in the returned data.frame with names `(offset)` or 
`(weights)`, respectively.
This is done in C code, called by the R function in code line [1], file 
src/library/stats/R/models.R.

The C code, file src/library/stats/src/model.c, after a preamble and 
some sanity checks, assembles a SEXP data and a SEXP names starting in 
line [2]. This `data` object includes the relevant variables in the 
`data` argument of R function model.frame.default plus the variables 
passed on to the C code in the dots argument.
Note that both `data` and `names` have size equal to nvars + nactualdots.

(The extra variables' names, such as `(offset)` or `(weights)` are 
created shortly after with snprintf in line 125.)

Then, much further down the code, on line [3], R method na.omit.default 
is called with `data` as an argument.
(This is a good example of calling R from C, btw. Another example is in 
Writing R Extensions, section 5.11, the lapply2 example.)

The point of this already lengthy explanation is to show that the `data` 
passed on to na.omit.data.frame is not the `data` model.frame.default 
got. What na.omit.data.frame gets is the new SEXP object also named 
`data` assembled by the C code function modelframe.
And this data set does not have a "na.action" attribute set, so when you 
pass `dat` na.omit will omit the NA's because they are present in the 
data but when you pass `dat_omitted` it will not, there is nothing to 
remove anymore.

The C function is even careful enough to copy `data`'s attributes to 
`ans`, in case na.action lost (some of) them. This is done starting in 
line 220, near the end of the function.

So if you are writing a fitting function and want to write a model.frame 
method, you will get a data set, for instance `dat_omitted`, possibly 
with a "na.action" attribute set. And you can let the C function take 
care of doing what it does best, then see if `dat_omitted`'s attributes 
are still relevant to the rest of your code.

Like others have said, na.omit's behavior should be and is cumulative 
and I understand your surprise with model.frame's behavior, which is not 
cumulative. But also I believe that you are mixing two completely 
different things. Don't expect model.frame to keep the "na.action" 
attribute if there is nothing to remove. NA removal is not what 
model.frame is meant to do, it has provisions to remove them if 
necessary, that's all.


[ off-topic, nitpick: in na.omit.data.frame, variable `omit` is logical 
and the condition in [4] could be simplified to if(any(omit)) ]



[1] 
https://github.com/wch/r-source/blob/trunk/src/library/stats/R/models.R#L566
[2] 
https://github.com/wch/r-source/blob/trunk/src/library/stats/src/model.c#L111
[3] 
https://github.com/wch/r-source/blob/trunk/src/library/stats/src/model.c#L210
[4] 
https://github.com/wch/r-source/blob/trunk/src/library/stats/R/nafns.R#L83


Hope this helps,

Rui Barradas



More information about the R-help mailing list