[R] Using na.omit() on the output of na.omit()

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Mon Mar 30 16:18:43 CEST 2026


I'm fairly sure that this is deliberate, although hardly something that I bump into every day. na.omit & friends are designed to work in connection with modeling code and there may be code down the line that wants to know about omitted cases. So na.omit cleans the original data by removing some cases and attaching a list of omitted cases; if you do it again, you still have a dataframe and some omitted cases. 

If anything, I would contend that omissions should be cumulative, so if you (say) add a column with some missing values and na.omit again, then the omission list should be augmented rather than replaced.

Also, from a pragmatic viewpoint, there is a chance that na.omit gets called repeatedly by coincidence in the labyrinth of interrelated lm functions.

-pd

> On 27 Mar 2026, at 15.13, Shu Fai Cheung <shufai.cheung using gmail.com> wrote:
> 
> Hi All,
> 
> I noticed a behavior of na.omit() that I am not sure is intended.
> 
> This is adapted from the example of na.omit()
> 
>> DF <- data.frame(x = c(1, 2, 3), y = c(NA, 10, 1))
>> DF_omitted <- na.omit(DF)
>> attr(DF_omitted, "na.action")
> 1
> 1
> attr(,"class")
> [1] "omit"
>> DF_omitted2 <- na.omit(DF_omitted)
>> attr(DF_omitted2, "na.action")
> 1
> 1
> attr(,"class")
> [1] "omit"
> 
> In the first call to na.omit(), the output, DF_omitted, correctly has only
> two rows, with Row 1 removed, and 'na.action' stores the removed row number.
> 
> In the second call of na.omit(), no cases are removed because DF_omitted
> has no missing data. However, the attribute 'na.action' is retained,
> indicating that Row 1 was removed.
> 
> To my understanding of na.omit(), this occurred because, in the second call
> of na.omit(), no rows were omitted, and so the original object was
> returned, along with the attribute 'na.action' from the first call.
> 
> From the "perspective" of the second call, no rows were omitted.
> Should 'na.action'
> be NULL (i.e., not set), as in the following example?
> 
>> DF2 <- data.frame(x = c(1, 2, 3), y = c(3, 10, 1))
>> DF2_omitted <- na.omit(DF2)
>> attr(DF2_omitted, "na.action")
> NULL
> 
> Or is this behavior of na.omit() intended?
> 
> Regards,
> Shu Fai
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-help mailing list