[R] Strange dataframe behavior
Sergey Goriatchev
sergeyg at gmail.com
Tue Oct 23 19:10:23 CEST 2007
Hello,
I have a question regarding the following output:
> database <- read.delim(file=path.input.file, header=TRUE, dec=".", sep="\t", na.strings = "#NV")
> str(database)
'data.frame': 314 obs. of 13 variables:
$ S : Factor w/ 314 levels "307073","400212",..: 147 72 299 137
162 62 189 236 134 307 ...
$ A : Factor w/ 314 levels "Alfa",...: 285 258 197 3 81 162 183 272
73 301 ...
$ M: Factor w/ 19 levels "@NA","A",..: 18 10 11 6 7 12 17 17 11 6 ...
$ W : num 0 0 0 0 0 ...
$ T : num 0.0467 0.1095 0.0252 0.0821 -0.0275 ...
$ C : num 0 0 0 0 0 ...
$ MF : num -0.658 0.261 0.922 -1.897 -1.884 ...
$ V : num 0.0585 -1.0852 -0.3156 -1.0592 0.2810 ...
$ G : num -0.568 -1.302 0.225 -1.473 -0.541 ...
$ Mo : num 0.34967 0.42807 -0.41407 -0.18216 -0.00305 ...
$ R : num -0.5413 -2.0000 0.5353 -1.1437 -0.0776 ...
$ Tr : num -0.12816 1.04148 0.00647 -0.02424 -1.66834 ...
$ Su : num -1.611 1.160 -0.528 -0.091 -1.148 ...
> which(is.na(database))
[1] 675 704 774 887
So, I have 314 observations, but there are unknown NA observations!
I remove one observation (for certain reasons), and remove the
corresponding factor level, then:
> str(database)
'data.frame': 313 obs. of 13 variables:
....
> which(is.na(database))
[1] 673 702 772 885
The removal of ONE observation moves NAs by two positions.
Maybe someone have an idea what these NA observations mean????
Thanks in advance for your time and help!
Sergey
University of Zurich
More information about the R-help
mailing list