[R] Using na.omit() on the output of na.omit()

Peter Dalgaard pd@|gd @end|ng |rom gm@||@com
Mon Mar 30 18:23:01 CEST 2026


The *names* would refer to the original df, or a nebulous one that has the rows of the result plus undefined rows with names as in the na.action attribute. There's no issue with cumulating the process that removes rows from x and puts their names into the list of removed row names. 

Notice however, that it would break rather badly if data frames were not required to have unique rownames...

(Not that I think this is worth actually doing... I think the only code in (base) R that actually uses the attribute is na.print() and its na.print.omit() method.)

-pd

> On 30 Mar 2026, at 17.15, Michael Dewey <lists using dewey.myzen.co.uk> wrote:
> 
> 
> 
> On 30/03/2026 15:18, peter dalgaard wrote:
>> I'm fairly sure that this is deliberate, although hardly something that I bump into every day. na.omit & friends are designed to work in connection with modeling code and there may be code down the line that wants to know about omitted cases. So na.omit cleans the original data by removing some cases and attaching a list of omitted cases; if you do it again, you still have a dataframe and some omitted cases.
>> If anything, I would contend that omissions should be cumulative, so if you (say) add a column with some missing values and na.omit again, then the omission list should be augmented rather than replaced.
> 
> Cumulative seems a good idea but since na.omit() returns the row numbers which data.frame will they refer to? Or have I misunderstood?
> 
> Michael
> 
>> Also, from a pragmatic viewpoint, there is a chance that na.omit gets called repeatedly by coincidence in the labyrinth of interrelated lm functions.
>> -pd
>>> On 27 Mar 2026, at 15.13, Shu Fai Cheung <shufai.cheung using gmail.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> I noticed a behavior of na.omit() that I am not sure is intended.
>>> 
>>> This is adapted from the example of na.omit()
>>> 
>>>> DF <- data.frame(x = c(1, 2, 3), y = c(NA, 10, 1))
>>>> DF_omitted <- na.omit(DF)
>>>> attr(DF_omitted, "na.action")
>>> 1
>>> 1
>>> attr(,"class")
>>> [1] "omit"
>>>> DF_omitted2 <- na.omit(DF_omitted)
>>>> attr(DF_omitted2, "na.action")
>>> 1
>>> 1
>>> attr(,"class")
>>> [1] "omit"
>>> 
>>> In the first call to na.omit(), the output, DF_omitted, correctly has only
>>> two rows, with Row 1 removed, and 'na.action' stores the removed row number.
>>> 
>>> In the second call of na.omit(), no cases are removed because DF_omitted
>>> has no missing data. However, the attribute 'na.action' is retained,
>>> indicating that Row 1 was removed.
>>> 
>>> To my understanding of na.omit(), this occurred because, in the second call
>>> of na.omit(), no rows were omitted, and so the original object was
>>> returned, along with the attribute 'na.action' from the first call.
>>> 
>>> From the "perspective" of the second call, no rows were omitted.
>>> Should 'na.action'
>>> be NULL (i.e., not set), as in the following example?
>>> 
>>>> DF2 <- data.frame(x = c(1, 2, 3), y = c(3, 10, 1))
>>>> DF2_omitted <- na.omit(DF2)
>>>> attr(DF2_omitted, "na.action")
>>> NULL
>>> 
>>> Or is this behavior of na.omit() intended?
>>> 
>>> Regards,
>>> Shu Fai
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Michael Dewey


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-help mailing list