[R] loop in a data.table

Camilo Mora cmora at DAL.CA
Thu Mar 14 04:27:53 CET 2013


I would like to clarify my previous email about using data.table.

imagine the following data.frame called "data":

a     b       c      d     e
1     12     15     65     6
1     65     85     36     5
2     69     84     35     8
2     45     78     65     8

I want to aggregate the rows of columns b:d by the rows of column a.  
the aggregation is sum(col[b:d]/sum(col[e]).
For this I am using a data.table with a loop of the form:

##########################################

ColNames<-colnames(data)   #gets the names of the columns

x=ncol(data)-1    #number of columns to process minus the last column.

data<-data.table(data)     #converts to data.table


for (z in 2:x)  #I start the loop in the second column and finish in column d
{
outputdata<-data[, sum(get(ColNames[z]))/sum(e), by="a"]
}
############################################


this works fine but the function "get" slowdown the aggregation of the  
rows by about 20 times. I wonder if there is an alternative fucntion  
to "get" or an alternative way to aggregate all columns at once. I am  
reading into the function .SD but have not yet figure out how to put  
more than one operation in the function.

right now I have:
###############
outputdata=data[, lapply(.SD, sum), by="a", .SDcols=2:x]

##############
this later code aggregates all columns at once but only by summing.  
eventually I need to divide the sum of each column by the sum of  
column e as well.

ANy help will be greatly appreciate.

Thanks,

Camilo






Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
          Provider code: 313
          Phone 776 2282
          From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/



More information about the R-help mailing list