[R] problem replacing NA's in a dataset (10% remain after removal attempt)
Julie Shoemaker
jshoemak at fas.harvard.edu
Thu Jul 19 20:05:41 CEST 2012
Hi all,
I'm attempting to gap-fill a dataset, replacing the missing values with
each month's day or night median value.
The problem is that my code results in some, but not all the NA's being
replaced and I cannot figure out how this is possible. When I look at
the individual line's where the NA's remain, they should have been
captured in my code as far as I can tell. Here is an example:
the dataset is 4464x14 called hourly.data
I've already replaced all NaN values with NA
#filPFD is a column of ambient light levels, it has no NA values, all
values are real and either 0 or >0
#month is a column with values between 7 and 12 depending on the month
the data was collected
#fillCH4 is a column containing CH4 flux data that I am trying to gap-fill
#night_median and day_median are 1x6 vectors with the median flux values
for each month
temp<-hourly.data[hourly.data$month==7,]
darkmonth<-(temp$filPFD==0)
daymonth<-(temp$filPFD>0)
temp[is.na(temp[darkmonth,"fillCH4"]),"fillCH4"]<-night_median[1]
temp[is.na(temp[daymonth,"fillCH4"]),"fillCH4"]<-day_median[1]
hourly.data[hourly.data$month==7,"fillCH4"]<-temp$fillCH4
This code replaces the majority of the NA's, but maybe 10% remain. The
cases that I have isolated, all have values of 7 for the "month" column
and real values in the "filPFD" column.
Any thoughts? Am I missing something obvious? Is there any way these
values could be coming up as NA but belong to some different
classification such that they are not picked up by the is.na function?
Best,
Julie
__________________________________
Julie Shoemaker, PhD
Postdoctoral Research Associate
Harvard University
phone: (617) 384-7237
email: jshoemak at fas.harvard.edu
More information about the R-help
mailing list