[R] how to parallelize 'apply' across multiple cores on a Mac
David Romano
dromano at stanford.edu
Sat May 4 20:27:56 CEST 2013
(I neglected to use reply-all.)
---------- Forwarded message ----------
From: David Romano <dromano at stanford.edu>
Date: Sat, May 4, 2013 at 11:25 AM
Subject: Re: [R] how to parallelize 'apply' across multiple cores on a Mac
To: Charles Berry <ccberry at ucsd.edu>
On Sat, May 4, 2013 at 9:32 AM, Charles Berry <ccberry at ucsd.edu> wrote:
> David,
>
> If you insist on explicitly parallelizing this:
>
> The functions in the recommended package 'parallel' work on a Mac.
>
> I would not try to work on each tiny column as a separate function call -
> too much overhead if you parallelize - instead, bundle up 100-1000 columns
> to operate on.
>
> The calc's you describe are sound simple enough that I would just write
> them in C and use the .Call interface to invoke them. You only need enough
> working memory in C to operate on one column and space to save the result.
>
> So a MacBook with 8GB of memory will handle it with room to breathe.
>
> This is a good use case for the 'inline' package, especially if you are
> unfamiliar with the use of .Call.
>
>
> ===
>
> But it might be as fast to forget about paralleizing this (explicitly).
>
[detailed recommendations deleted]
>
> On a Mac, the vecLib BLAS will do crossprod using the multiple
> cores without your needing to do anything special. So you can forget about
> 'parallel', 'multicore', etc.
>
>
> So your remaining problem is to reread steps 2=6 and figure out what
> 'minimal.matrix' and 'fill.rows' have to be.
>
> ===
>
> You can also approach this problem using 'filter', but that can get
> 'convoluted' (pun intended - see ?filter).
>
> HTH,
Thanks, Charles, for all the helpful pointers! For the moment, I'll
leave parallelization aside, and will explore using 'crossprod' and
'filter'. Although, from your suggestion that 8 GB of memory should
be sufficient if I went the parallel, I also wonder whether I'm
suffering not just from inefficient use of computing resources, but
that there's a memory leak as well: The original 'apply' code would,
in much less than a minute, take over the full 18 GB of memory
available on my workstation, and then leave it functioning at a crawl
for at least a half hour or so. I'll ask about this by reposting
this message again with a different subject, so no need to address it
in this thread.
Thanks again,
David
More information about the R-help
mailing list