[R] list to dataframe conversion-testing for identical
arun
smartpink111 at yahoo.com
Mon Jul 2 02:18:13 CEST 2012
HI All,
Thanks for your replies.
A.K.
----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Sunday, July 1, 2012 6:31 PM
Subject: Re: [R] list to dataframe conversion-testing for identical
On Jul 1, 2012, at 5:09 PM, David L Carlson wrote:
> Yes it does have something to do with the representation of floating point
> numbers. Using cbind() forces the list to become a matrix and that forces
> all of the data to become character strings since one of the list elements
> is character:
>
>> set.seed(42)
>> listdat1<-list(
>> str(do.call("cbind", listdat1))
> chr [1:10, 1:3] "21.3709584471467" "19.4353018286039" ...
> Then you convert that to a data.frame. The default in data.frame() is to
> convert characters to factors so you get
>
>> str(data.frame(do.call("cbind",listdat1)))
> 'data.frame': 10 obs. of 3 variables:
> $ X1: Factor w/ 10 levels "19.4353018286039",..: 8 1 5 7 6 2 9 3 10 4
> $ X2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
> $ X3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
Yes, arun. If the coding had proceeded otherwise a more natural and expected result might have occurred:
> dat1<-do.call("data.frame",listdat1)
> colnames(dat1)<-c("Var1","Var2","Var3")
> dat1
Var1 Var2 Var3
1 21.14076 A 1
2 19.53277 B 2
3 19.59725 A 3
4 19.84262 B 4
5 19.93251 A 5
6 20.92242 B 1
7 19.22315 A 2
8 19.13742 B 3
9 18.82441 A 4
10 20.92661 B 5
Whoever taught you to use 'cbind' for construction of data.frames did you a great disservice. It would seem much less problematic to have simply done this in the first place:
dat1 <- data.frame(Var1=rnorm(10,20),Var2=rep(LETTERS[1:2],5),var3=rep(1:5,2) )
--David.
>
> With dat2 you used data.frame() so the numeric fields were not converted to
> strings and then factors. Then you converted the dat1 factors back to
> numeric. You would be fine with just
>
>> dat1 <- data.frame(listdat1)
>> colnames(dat1) <- paste0("Var", 1:3)
>
> Or you can name the list elements and then convert
>
>> names(listdat1) <- paste0("Var", 1:3)
>> dat1 <- data.frame(listdat1)
>
> ----------------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of arun
>> Sent: Sunday, July 01, 2012 12:56 PM
>> To: R help
>> Subject: [R] list to dataframe conversion-testing for identical
>>
>> HI R help,
>>
>> I was trying to get identical data frame from a list using two methods.
>>
>> #Suppose my list is:
>> listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2))
>> #Creating dataframe using cbind
>>
>> dat1<-data.frame(do.call("cbind",listdat1))
>> colnames(dat1)<-c("Var1","Var2","Var3")
>> #Second dataframe conversion
>>
>> dat2<-
>> data.frame(Var1=listdat1[[1]],Var2=listdat1[[2]],Var3=listdat1[[3]])
>>
>> #Structure is different in two datasets
>> >str(dat1)
>> 'data.frame': 10 obs. of 3 variables:
>> $ Var1: Factor w/ 10 levels "18.6153321029756",..: 5 2 6 8 7 9 1 4 3
>> 10
>> $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>> $ Var3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
>>> str(dat2)
>> 'data.frame': 10 obs. of 3 variables:
>> $ Var1: num 20.3 19.2 20.5 20.9 20.5 ...
>> $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>> $ Var3: int 1 2 3 4 5 1 2 3 4 5
>>
>> #Converting structure of dat1 to match da2 structure
>> dat1<-within(dat1,{Var1<-as.numeric(as.character(Var1))
>> Var3<-as.integer(Var3)})
>>
>> head(dat1)
>> Var1 Var2 Var3
>> 1 20.27193 A 1
>> 2 19.17586 B 2
>> 3 20.53197 A 3
>> 4 20.93615 B 4
>> 5 20.53498 A 5
>> 6 21.02044 B 1
>>> head(dat2)
>> Var1 Var2 Var3
>> 1 20.27193 A 1
>> 2 19.17586 B 2
>> 3 20.53197 A 3
>> 4 20.93615 B 4
>> 5 20.53498 A 5
>> 6 21.02044 B 1
>>
>>
>> #New structure identical(str(dat1),str(dat2))
>> 'data.frame': 10 obs. of 3 variables:
>> $ Var1: num 19.9 19 21.2 20.7 20.4 ...
>> $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>> $ Var3: int 1 2 3 4 5 1 2 3 4 5
>> 'data.frame': 10 obs. of 3 variables:
>> $ Var1: num 19.9 19 21.2 20.7 20.4 ...
>> $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2
>> $ Var3: int 1 2 3 4 5 1 2 3 4 5
>> [1] TRUE
>>
>>
>>
>> #structure is identical and dataframe looks to be same, but it is not
>> identical.
>>> identical(dat1,dat2)
>> [1] FALSE
>>
>>
>> Is it something to do with the floating point?
>>
>> Thanks,
>>
>> A.K.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list