[R] list to dataframe conversion-testing for identical
    David Winsemius 
    dwinsemius at comcast.net
       
    Mon Jul  2 00:31:45 CEST 2012
    
    
  
On Jul 1, 2012, at 5:09 PM, David L Carlson wrote:
> Yes it does have something to do with the representation of floating  
> point
> numbers. Using cbind() forces the list to become a matrix and that  
> forces
> all of the data to become character strings since one of the list  
> elements
> is character:
>
>> set.seed(42)
>> listdat1<-list(
>> str(do.call("cbind", listdat1))
> chr [1:10, 1:3] "21.3709584471467" "19.4353018286039" ...
> Then you convert that to a data.frame. The default in data.frame()  
> is to
> convert characters to factors so you get
>
>> str(data.frame(do.call("cbind",listdat1)))
> 'data.frame':   10 obs. of  3 variables:
> $ X1: Factor w/ 10 levels "19.4353018286039",..: 8 1 5 7 6 2 9 3 10 4
> $ X2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
> $ X3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
Yes, arun. If the coding had proceeded otherwise a more natural and  
expected result might have occurred:
 > dat1<-do.call("data.frame",listdat1)
 > colnames(dat1)<-c("Var1","Var2","Var3")
 > dat1
        Var1 Var2 Var3
1  21.14076    A    1
2  19.53277    B    2
3  19.59725    A    3
4  19.84262    B    4
5  19.93251    A    5
6  20.92242    B    1
7  19.22315    A    2
8  19.13742    B    3
9  18.82441    A    4
10 20.92661    B    5
Whoever taught you to use 'cbind' for construction of data.frames did  
you a great disservice. It would seem much less problematic to have  
simply done this in the first place:
dat1 <- data.frame(Var1=rnorm(10,20),Var2=rep(LETTERS[1:2], 
5),var3=rep(1:5,2) )
-- 
David.
>
> With dat2 you used data.frame() so the numeric fields were not  
> converted to
> strings and then factors. Then you converted the dat1 factors back to
> numeric. You would be fine with just
>
>> dat1 <- data.frame(listdat1)
>> colnames(dat1) <- paste0("Var", 1:3)
>
> Or you can name the list elements and then convert
>
>> names(listdat1) <- paste0("Var", 1:3)
>> dat1 <- data.frame(listdat1)
>
> ----------------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of arun
>> Sent: Sunday, July 01, 2012 12:56 PM
>> To: R help
>> Subject: [R] list to dataframe conversion-testing for identical
>>
>> HI R help,
>>
>> I was trying to get identical data frame from a list using two  
>> methods.
>>
>> #Suppose my list is:
>> listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2))
>> #Creating dataframe using cbind
>>
>> dat1<-data.frame(do.call("cbind",listdat1))
>> colnames(dat1)<-c("Var1","Var2","Var3")
>> #Second dataframe conversion
>>
>> dat2<-
>> data.frame(Var1=listdat1[[1]],Var2=listdat1[[2]],Var3=listdat1[[3]])
>>
>> #Structure is different in two datasets
>>  >str(dat1)
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: Factor w/ 10 levels "18.6153321029756",..: 5 2 6 8 7 9 1 4 3
>> 10
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
>>> str(dat2)
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: num  20.3 19.2 20.5 20.9 20.5 ...
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: int  1 2 3 4 5 1 2 3 4 5
>>
>> #Converting structure of dat1 to match da2 structure
>> dat1<-within(dat1,{Var1<-as.numeric(as.character(Var1))
>>     Var3<-as.integer(Var3)})
>>
>> head(dat1)
>>       Var1 Var2 Var3
>> 1 20.27193    A    1
>> 2 19.17586    B    2
>> 3 20.53197    A    3
>> 4 20.93615    B    4
>> 5 20.53498    A    5
>> 6 21.02044    B    1
>>> head(dat2)
>>       Var1 Var2 Var3
>> 1 20.27193    A    1
>> 2 19.17586    B    2
>> 3 20.53197    A    3
>> 4 20.93615    B    4
>> 5 20.53498    A    5
>> 6 21.02044    B    1
>>
>>
>> #New structure identical(str(dat1),str(dat2))
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: num  19.9 19 21.2 20.7 20.4 ...
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: int  1 2 3 4 5 1 2 3 4 5
>> 'data.frame':    10 obs. of  3 variables:
>>  $ Var1: num  19.9 19 21.2 20.7 20.4 ...
>>  $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>>  $ Var3: int  1 2 3 4 5 1 2 3 4 5
>> [1] TRUE
>>
>>
>>
>> #structure is identical and dataframe looks to be same, but it is not
>> identical.
>>> identical(dat1,dat2)
>> [1] FALSE
>>
>>
>> Is it something to do with the floating point?
>>
>> Thanks,
>>
>> A.K.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
    
    
More information about the R-help
mailing list