[R] removing NA from a data frame
    Peter Ehlers 
    ehlers at ucalgary.ca
       
    Fri Jun 22 12:22:25 CEST 2012
    
    
  
On 2012-06-22 01:41, Stuart Leask wrote:
> Removing rows with NAs, using na.omit(), doesn't seem to be working for me.
>
> Dataset:
>
>> str ( ex10s )
>
> 'data.frame':   2189576 obs. of  5 variables:
> $ LOPNR  : int  58 58 58 58 64 64 64 64 64 64 ...
> $ DIAGNOS: Factor w/ 173 levels "F20","F200","F2000",..: 128 128 128 128 105 105 105 160 105 105 ...
> $ X_DATE : int  20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ...
> $ SOURCE : int  2 2 2 2 2 2 2 2 2 1 ...
> $ dg     : Factor w/ 7 levels "0","1","2","3",..: 6 6 6 6 5 5 5 6 5 5 ...
>
> The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels)
>
>> table ( ex10s$dg )
>
>        0       1       2       3       4       5      NA
>     2851  271501   63112   98425  335593 1257299  160795
This shows that what you think are missing values (NAs)
R considers to be values at the factor level "NA".
If you do
   levels(ex10s$dg)
you should see "NA" as one of the levels. This probably
resulted from incorrect data import. When you print ex10s$dg
you should see missing values printed as <NA>, not NA.
Either re-import the data or run
  is.na(ex10s$dg) <- ex10s$dg == "NA"
  ex10s$dg <- factor(ex10s$dg)   ## to remove the superfluous level
Peter Ehlers
>
> So, I remove the rows with NAs, to a new dataframe ex10ss:
>
>> ex10ss<-na.omit(ex10s)
>
> Check all the NAs have been removed:
>
>> table(ex10ss$dg)
>
>        0       1       2       3       4       5      NA
>     2851  271501   63112   98425  335593 1257299  160795
>
>> dim(ex10s)
> [1] 2189576       5
>> dim(ex10ss)
> [1] 2189576       5
>
> Nothing seems to have changed. I want all the rows with NA in removed.
>
> I am clearly doing something wrong.
>
> The only alternative I could find is pretty similar:
> use<- complete.cases ( ex10 )
> ex10ss<-ex10s[use,]
> which leads to the same result.
>
>
> Stuart
>
>
> Dr Stuart John Leask DM FRCPsych MB Mchir
> Clinical Senior Lecturer and Honorary Consultant Pychiatrist
> Institute of Mental Health, Innovation Park
> Triumph Road, Nottingham, Notts. NG7 2TU. UK
> Tel. +44 115 82 30419 stuart.leask at nottingham.ac.uk<mailto:stuart.leask at nottingham.ac.uk>
> Google 'Dr Stuart Leask'
>
>
> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system:
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
    
    
More information about the R-help
mailing list