[R] How to read HUGE data sets?
Roland Rau
roland.rproject at gmail.com
Thu Feb 28 22:47:55 CET 2008
Hi,
Jorge Iván Vélez wrote:
> Dear R-list,
>
> Does somebody know how can I read a HUGE data set using R? It is a hapmap
> data set (txt format) which is around 4GB. After read it, I need to delete
> some specific rows and columns. I'm running R 2.6.2 patched over XP SP2
in such a case, I would recommend not to use R in the beginning. Try to
use awk[1] to cut out the correct rows and columns. If the resulting
data are still very large, I would suggest to read it into a Database
System. My experience is limited in that respect: I only used SQLite.
But in conjunction with the RSQLite package, I was managed all my "big
data problems".
Check http://www.ibm.com/developerworks/library/l-awk1.html to get you
smoothly started with awk.
I hope this helps,
Roland
[1] I think the gawk implementation offers most options (e.g. for
timing) but I recently used mawk on Windows XP and it was way faster (or
was it nawk?). If you don't have experience in some language such as
perl, I'd say it is much easier to learn awk than perl.
More information about the R-help
mailing list