[R] apply on large arrays
Erich Neuwirth
erich.neuwirth at univie.ac.at
Thu Feb 14 19:46:27 CET 2008
> system.time({
+ tab2 <- tab1 <- with(pisa1, table(CNT,GENDER,ISCOF,ISCOM))
+ tab2[] <- 0
+ tab2[which(tab1 == 1, arr.ind = TRUE)] <- 1
+ tab3 <- rowSums(tab2)
+ })
user system elapsed
3.17 0.99 4.17
>
> system.time({
+ tab4 <- rowSums(tab1 == 1)
+ })
user system elapsed
1.02 0.18 1.20
>
And yes,
the results were identical.
Bill.Venables at csiro.au wrote:
> Was the answer the same as the one you were getting with the original
> code?
>
> How long did the original code take compared to these two versions?
>
> Cheers,
> Bill V.
>
>
> Bill Venables
> CSIRO Laboratories
> PO Box 120, Cleveland, 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary): +61 7 3826 7304
> Mobile: +61 4 8819 4402
> Home Phone: +61 7 3286 7700
> mailto:Bill.Venables at csiro.au
> http://www.cmis.csiro.au/bill.venables/
>
> -----Original Message-----
> From: Erich Neuwirth [mailto:erich.neuwirth at univie.ac.at]
> Sent: Thursday, 14 February 2008 5:08 PM
> To: Venables, Bill (CMIS, Cleveland)
> Subject: Re: [R] apply on large arrays
>
> Thanks, this version is definitely faster than the first one.
> system.time gives 0.13 instead of 0.79 seconds.
>
>
>
> Bill.Venables at csiro.au wrote:
>> Hmm. I think this could be faster still:
>>
>> tab1 <- with(pisa1, table(CNT,GENDER,ISCOF,ISCOM))
>> tab3 <- rowSums(tab1 == 1)
>>
>> but check it...
>>
>> Bill Venables
>> CSIRO Laboratories
>> PO Box 120, Cleveland, 4163
>> AUSTRALIA
>> Office Phone (email preferred): +61 7 3826 7251
>> Fax (if absolutely necessary): +61 7 3826 7304
>> Mobile: +61 4 8819 4402
>> Home Phone: +61 7 3286 7700
>> mailto:Bill.Venables at csiro.au
>> http://www.cmis.csiro.au/bill.venables/
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org]
>> On Behalf Of Venables, Bill (CMIS, Cleveland)
>> Sent: Thursday, 14 February 2008 10:30 AM
>> To: erich.neuwirth at univie.ac.at; r-help at stat.math.ethz.ch
>> Subject: Re: [R] apply on large arrays
>>
>> Your code is
>>
>>
>> tab1 <- with(pisa1, table(CNT,GENDER,ISCOF,ISCOM))
>> tab2 <- apply(tab1, 1:4,
>> function(x) ifelse(sum(x) == 1, 1, 0))
>> tab3 <- apply(tab2, 1, sum)
>>
>> As far as I can see, step 2, (the problematic one), merely replaces
> any
>> entries in tab1 that are not equal to one by zeros. I think this
> would
>> do the same job a bit faster:
>>
>> tab2 <- tab1 <- with(pisa1, table(CNT,GENDER,ISCOF,ISCOM))
>> tab2[] <- 0
>> tab2[which(tab1 == 1, arr.ind = TRUE)] <- 1
>> tab3 <- rowSums(tab2)
>>
>> If you don't need to keep tab1, you would make things even better by
>> removing it.
>>
>> Bill Venables.
>>
>>
>>
>>
>>
>> Bill Venables
>> CSIRO Laboratories
>> PO Box 120, Cleveland, 4163
>> AUSTRALIA
>> Office Phone (email preferred): +61 7 3826 7251
>> Fax (if absolutely necessary): +61 7 3826 7304
>> Mobile: +61 4 8819 4402
>> Home Phone: +61 7 3286 7700
>> mailto:Bill.Venables at csiro.au
>> http://www.cmis.csiro.au/bill.venables/
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org]
>> On Behalf Of Erich Neuwirth
>> Sent: Thursday, 14 February 2008 9:52 AM
>> To: r-help
>> Subject: [R] apply on large arrays
>>
>> I have a big contingency table, approximately of size 60*2*500*500,
>> and I need to count the number of cells containing a count of 1 for
> each
>> of the factors values defining the first dimension.
>> Here is my attempt:
>>
>> tab1<-with(pisa1,table(CNT,GENDER,ISCOF,ISCOM))
>> tab2<-apply(tab1,1:4,function(x)ifelse(sum(x)==1,1,0))
>> tab3<-apply(tab2,1,sum)
>>
>> Computing tab2 is very slow.
>> Is there a faster and/or more elegant way of doing this?
>
--
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459
More information about the R-help
mailing list