[R] list to dataframe conversion-testing for identical

arun smartpink111 at yahoo.com
Mon Jul 2 02:18:13 CEST 2012


HI All,

Thanks for your replies.

A.K.



----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Sunday, July 1, 2012 6:31 PM
Subject: Re: [R] list to dataframe conversion-testing for identical


On Jul 1, 2012, at 5:09 PM, David L Carlson wrote:

> Yes it does have something to do with the representation of floating point
> numbers. Using cbind() forces the list to become a matrix and that forces
> all of the data to become character strings since one of the list elements
> is character:
> 
>> set.seed(42)
>> listdat1<-list(
>> str(do.call("cbind", listdat1))
> chr [1:10, 1:3] "21.3709584471467" "19.4353018286039" ...
> Then you convert that to a data.frame. The default in data.frame() is to
> convert characters to factors so you get
> 
>> str(data.frame(do.call("cbind",listdat1)))
> 'data.frame':   10 obs. of  3 variables:
> $ X1: Factor w/ 10 levels "19.4353018286039",..: 8 1 5 7 6 2 9 3 10 4
> $ X2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
> $ X3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5

Yes, arun. If the coding had proceeded otherwise a more natural and expected result might have occurred:

> dat1<-do.call("data.frame",listdat1)
> colnames(dat1)<-c("Var1","Var2","Var3")
> dat1
       Var1 Var2 Var3
1  21.14076    A    1
2  19.53277    B    2
3  19.59725    A    3
4  19.84262    B    4
5  19.93251    A    5
6  20.92242    B    1
7  19.22315    A    2
8  19.13742    B    3
9  18.82441    A    4
10 20.92661    B    5

Whoever taught you to use 'cbind' for construction of data.frames did you a great disservice. It would seem much less problematic to have simply done this in the first place:

dat1 <- data.frame(Var1=rnorm(10,20),Var2=rep(LETTERS[1:2],5),var3=rep(1:5,2) )

--David.
> 
> With dat2 you used data.frame() so the numeric fields were not converted to
> strings and then factors. Then you converted the dat1 factors back to
> numeric. You would be fine with just
> 
>> dat1 <- data.frame(listdat1)
>> colnames(dat1) <- paste0("Var", 1:3)
> 
> Or you can name the list elements and then convert
> 
>> names(listdat1) <- paste0("Var", 1:3)
>> dat1 <- data.frame(listdat1)
> 
> ----------------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
> 
> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of arun
>> Sent: Sunday, July 01, 2012 12:56 PM
>> To: R help
>> Subject: [R] list to dataframe conversion-testing for identical
>> 
>> HI R help,
>> 
>> I was trying to get identical data frame from a list using two methods.
>> 
>> #Suppose my list is:
>> listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2))
>> #Creating dataframe using cbind
>> 
>> dat1<-data.frame(do.call("cbind",listdat1))
>> colnames(dat1)<-c("Var1","Var2","Var3")
>> #Second dataframe conversion
>> 
>> dat2<-
>> data.frame(Var1=listdat1[[1]],Var2=listdat1[[2]],Var3=listdat1[[3]])
>> 
>> #Structure is different in two datasets
>>  >str(dat1)
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: Factor w/ 10 levels "18.6153321029756",..: 5 2 6 8 7 9 1 4 3
>> 10
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
>>> str(dat2)
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: num  20.3 19.2 20.5 20.9 20.5 ...
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: int  1 2 3 4 5 1 2 3 4 5
>> 
>> #Converting structure of dat1 to match da2 structure
>> dat1<-within(dat1,{Var1<-as.numeric(as.character(Var1))
>>     Var3<-as.integer(Var3)})
>> 
>> head(dat1)
>>       Var1 Var2 Var3
>> 1 20.27193    A    1
>> 2 19.17586    B    2
>> 3 20.53197    A    3
>> 4 20.93615    B    4
>> 5 20.53498    A    5
>> 6 21.02044    B    1
>>> head(dat2)
>>       Var1 Var2 Var3
>> 1 20.27193    A    1
>> 2 19.17586    B    2
>> 3 20.53197    A    3
>> 4 20.93615    B    4
>> 5 20.53498    A    5
>> 6 21.02044    B    1
>> 
>> 
>> #New structure identical(str(dat1),str(dat2))
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: num  19.9 19 21.2 20.7 20.4 ...
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: int  1 2 3 4 5 1 2 3 4 5
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: num  19.9 19 21.2 20.7 20.4 ...
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: int  1 2 3 4 5 1 2 3 4 5
>> [1] TRUE
>> 
>> 
>> 
>> #structure is identical and dataframe looks to be same, but it is not
>> identical.
>>> identical(dat1,dat2)
>> [1] FALSE
>> 
>> 
>> Is it something to do with the floating point?
>> 
>> Thanks,
>> 
>> A.K.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list