[R] format.data.frame and NA control
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Mon Oct 22 20:18:29 CEST 2007
Sebastian P. Luque wrote:
> Hi,
>
> Is there a more efficient way to output NA strings as empty strings in
> format.data.frame than this:
>
> ---<---------------cut here---------------start-------------->---
> R> tt <- data.frame(a=c(NA, rnorm(8), NA), b=c(NA, letters[1:8], NA))
> R> tt <- format(tt, digits=5, trim=TRUE)
> R> tt
> a b
> 1 NA NA
> 2 2.012460 a
> 3 0.364181 b
> 4 1.398317 c
> 5 0.730969 d
> 6 -1.321741 e
> 7 0.081472 f
> 8 2.019201 g
> 9 0.090003 h
> 10 NA NA
> R> as.data.frame(lapply(tt, function(x) {x[x == "NA"] <- ""; x}))
> a b
> 1
> 2 2.012460 a
> 3 0.364181 b
> 4 1.398317 c
> 5 0.730969 d
> 6 -1.321741 e
> 7 0.081472 f
> 8 2.019201 g
> 9 0.090003 h
> 10
> ---<---------------cut here---------------end---------------->---
>
> Thanks.
>
I suspect that there's a bug lurking in here. I get
> format(c(1,NA),na.encode=TRUE)
[1] " 1" "NA"
> format(c(1,NA),na.encode=FALSE)
[1] " 1" "NA"
I.e., they give the same thing, where I would expect that the latter gave
> c("1",NA)
[1] "1" NA
The point is that if NA had been passed through like that, then you
might simply have used print(tt, na.print="", ...) but as it is:
> print(tt, na.print="", digits=5)
a b
1 NA
2 0.60110 a
3 0.40988 b
4 -1.45437 c
5 1.58159 d
6 0.52801 e
7 -0.52988 f
8 -1.63540 g
9 -0.38973 h
10 NA
... it only works on character columns, not the numeric ones.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list