[R] read large amount of data
    Weiwei Shi 
    helprhelp at gmail.com
       
    Mon Jul 18 17:34:51 CEST 2005
    
    
  
Hi,
I have a dataset with 2194651x135, in which all the numbers are 0,1,2,
and is bar-delimited.
I used the following approach which can handle 100,000 lines:
t<-scan('fv', sep='|', nlines=100000)
t1<-matrix(t, nrow=135, ncol=100000)
t2<-t(t1)
t3<-as.data.frame(t2)
I changed my plan into using stratified sampling with replacement (col
2 is my class variable: 1 or 2). The class distr is like:
awk -F\| '{print $2}' fv | sort | uniq -c
2162792 1
  31859 2
Is it possible to use R to read the whole dataset and do the
stratified sampling? Is it really dependent on my memory size?
Mem:   3111736k total,  1023040k used,  2088696k free,   150160k buffers
Swap:  4008208k total,    19040k used,  3989168k free,   668892k cached
Thanks,
weiwei
-- 
Weiwei Shi, Ph.D
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
    
    
More information about the R-help
mailing list