[R] efficient writing of calculation involving each element of 2 data frames.
Uwe Ligges
ligges at statistik.tu-dortmund.de
Mon Feb 25 12:03:39 CET 2008
Vikas N Kumar wrote:
> Hi
>
> I have 2 data.frames each of the same number of rows (approximately 30000 or
> more entries).
> They also have the same number of columns, lets say 2.
> One column has the date, the other column has a double precision number. Let
> the column names be V1, V2.
>
> Now I want to calculate the correlation of the 2 sets of data, for the last
> 100 days for every day available in the data.frames.
>
> My code looks like this :
> # Let df1, and df2 be the 2 data frames with the required data
> ## begin code snippet
>
> my_corr <- c();
> for ( i_end in 100:nrow(df1)) {
> i_start <- i_end - 99;
> my_corr[i_start] <-
> cor(x=df1[i_start:i_end,"V2"],y=df2[i_start:i_end,"V2"])
> }
I'd rather do it this way:
n <- nrow(df1) - 99
my_corr <- numeric(n)
i_end <- seq(n) + 99
dat1 <- df1[,"V2"]
dat2 <- df2[,"V2"]
for (i in seq(n)) {
sq <- i:(i+99)
my_corr[i] <- cor(x=dat1[sq], y=dat2[sq])
}
because most of your time has been consumed by the indexing function
[.data.frame
as profiling shows. Type ?Rprof in order to learn to so profiling yourself.
Uwe Ligges
> ## end of code snippet
>
> This runs very slowly, and takes more than an hour to run if I have to
> calculate correlation between 10 data sets leaving me with 45 runs of this
> snippet or taking more than 30 minutes to run.
>
> Is there an efficient way to write this piece of code where I can get it
> to run faster ?
>
> If I do something similar in Excel, it is much faster. But I have to use R,
> since this is a part of a bigger program.
>
> Any help will be appreciated.
>
> Thanks and Regards
> Vikas
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list