[R] Using na.omit() on the output of na.omit()

Mon Mar 30 10:12:47 CEST 2026

Hello,

Inline.

Às 03:34 de 30/03/2026, Shu Fai Cheung escreveu:
> Hi All,
> 
> I would like to share more about the behavior of na.omit() from my
> previous email.
> 
> This is a test dataset:
> 
>> DF <- data.frame(x = c(1, 2, 3, 5), y = c(NA, 10, 1, 10))
> 
> One case omitted by na.omit()
>> DF_omitted <- na.omit(DF)
>> DF_omitted
>    x  y
> 2 2 10
> 3 3  1
> 4 5 10
>> na.action(DF_omitted)
> 1
> 1
> attr(,"class")
> [1] "omit"
> 
> Then I do regression on this DF_omitted (na.action can be omitted, but
> I included it just to show that na.omit is used)
> 
>> out1 <- lm(y ~ x,
> +            DF_omitted,
> +            na.action = na.omit)
>> summary(out1)
> 
> Call:
> lm(formula = y ~ x, data = DF_omitted, na.action = na.omit)
> 
> Residuals:
>       2      3      4
>   3.857 -5.786  1.929
> 
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)   4.8571    11.8885   0.409    0.753
> x             0.6429     3.3404   0.192    0.879
> 
> Residual standard error: 7.216 on 1 degrees of freedom
> Multiple R-squared:  0.03571,   Adjusted R-squared:  -0.9286
> F-statistic: 0.03704 on 1 and 1 DF,  p-value: 0.879
> 
>> na.action(out1)
> NULL
>> na.action(na.omit(DF_omitted))
> 1
> 1
> attr(,"class")
> [1] "omit"
> 
> The output of lm() states that no cases were removed. To me, this is a
> correct message because no cases were removed when fitting the model,
> that is, in *this call* of lm(). One case was deleted, but before
> being processed by lm(). It is not done by this call of lm().
> 
> If the model is fitted to the original data set, then it will report,
> again correctly, that *this call* of lm() removed one case:
> 
>> out2 <- lm(y ~ x,
> +            DF,
> +            na.action = na.omit)
>> summary(out2)
> 
> Call:
> lm(formula = y ~ x, data = DF, na.action = na.omit)
> 
> Residuals:
>       2      3      4
>   3.857 -5.786  1.929
> 
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)   4.8571    11.8885   0.409    0.753
> x             0.6429     3.3404   0.192    0.879
> 
> Residual standard error: 7.216 on 1 degrees of freedom
>    (1 observation deleted due to missingness)
> Multiple R-squared:  0.03571,   Adjusted R-squared:  -0.9286
> F-statistic: 0.03704 on 1 and 1 DF,  p-value: 0.879
> 
>> na.action(out2)
> 1
> 1
> attr(,"class")
> [1] "omit"
> 
> However, the behavior of na.omit() is different. If na.omit() is run
> again on DF_omitted, it will say that Case 1 was removed.
> 
>> na.action(na.omit(DF))
> 1
> 1
> attr(,"class")
> [1] "omit"
> 
> This behavior, to me, can be misleading because no cases were removed
> by *this call* of na.omit().

Yes, they were. You are running na.omit on the original DF and na.action 
correctly reports one case omited.

> 
> Moreover, from the help page of na.omit():
> 
> 'If na.omit removes cases, the row numbers of the cases form the
> "na.action" attribute of the result, of class "omit".'
> 
> It does not state what happens if na.omit() does not remove cases.
>  From the example of lm() above, na.omit() will report/store that no
> cases were removed, though this is actually the behavior of
> model.frame()/lm(), not of na.omit()). However, when used alone,
> na.omit() will do nothing.
> 
> Changing this behavior of na.omit() may break a lot of things. 

I don't understand how it can break something. The coefs, degrees of 
freedom, etc, of the two lm calls are identically equal. Nothing is broken.

If this
> behavior of na.omit(), doing nothing, is intended (though unnatural to
> me), perhaps this should be stated in the help page?

But it does, doesn't it?
In section Description:

 > na.omit returns the object with incomplete cases removed.

Then, in Details:

 > If na.omit removes cases, the row numbers of the cases form the 
"na.action" attribute of the result, of class "omit".

Note the "If". If there are no NA's then nothing is done. This is 
already made explicit in Description.

NA's are always a problem, but in this case you are wrong, you are 
seeing too much in na.omit's intended behavior.

Furthermore, if a function that is part of R since the beginning and 
that is used countless of times by countless users was broken, someone 
would have noticed it already. No one has because it doesn't break things.

Hope this helps,

Rui Barradas

> 
> Regards,
> Shu Fai
> 
> On Fri, Mar 27, 2026 at 10:13 PM Shu Fai Cheung <shufai.cheung using gmail.com> wrote:
>>
>> Hi All,
>>
>> I noticed a behavior of na.omit() that I am not sure is intended.
>>
>> This is adapted from the example of na.omit()
>>
>>> DF <- data.frame(x = c(1, 2, 3), y = c(NA, 10, 1))
>>> DF_omitted <- na.omit(DF)
>>> attr(DF_omitted, "na.action")
>> 1
>> 1
>> attr(,"class")
>> [1] "omit"
>>> DF_omitted2 <- na.omit(DF_omitted)
>>> attr(DF_omitted2, "na.action")
>> 1
>> 1
>> attr(,"class")
>> [1] "omit"
>>
>> In the first call to na.omit(), the output, DF_omitted, correctly has only two rows, with Row 1 removed, and 'na.action' stores the removed row number.
>>
>> In the second call of na.omit(), no cases are removed because DF_omitted has no missing data. However, the attribute 'na.action' is retained, indicating that Row 1 was removed.
>>
>> To my understanding of na.omit(), this occurred because, in the second call of na.omit(), no rows were omitted, and so the original object was returned, along with the attribute 'na.action' from the first call.
>>
>>  From the "perspective" of the second call, no rows were omitted. Should 'na.action' be NULL (i.e., not set), as in the following example?
>>
>>> DF2 <- data.frame(x = c(1, 2, 3), y = c(3, 10, 1))
>>> DF2_omitted <- na.omit(DF2)
>>> attr(DF2_omitted, "na.action")
>> NULL
>>
>> Or is this behavior of na.omit() intended?
>>
>> Regards,
>> Shu Fai
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.