[R] sqldf for Very Large Tab Delimited Files

Gabor Grothendieck ggrothendieck at gmail.com
Fri Feb 3 04:52:54 CET 2012


On Thu, Feb 2, 2012 at 8:07 PM, HC <hcatbr at yahoo.co.in> wrote:
> Hi Gabor,
>
> Thank you very much for your guidance and help.
>
> I could run the following code successfully on a 500 mb test data file. A
> snapshot of the data file is attached herewith.
>
> ********code start***************
> library(sqldf)
> library(RSQLite)
>
> iFile<-"Test100.txt"
> con <- dbConnect(SQLite(),dbname = "myTest100")
> dbWriteTable(con, "TestDB100", iFile, sep = "\t") #, eol = "\r\n")
> nms <- names(dbGetQuery(con, "select * from TestDB100 limit 0"))
>
> nRec<-fn$dbGetQuery(con, "select count(*)from TestDB100")
> aL1<-1;
>
> while (aL1<=nRec){
> res1<-fn$dbGetQuery(con, "select * from (select * from TestDB100 limit
> '$aL1',1)")
> istn<-res1[1,1]
> res1<-fn$dbGetQuery(con, "select * from TestDB100 where `nms[1]` = '$istn'")
> icount<-dim(res1)[1]
> oFile<-paste(istn,"_Test.txt",sep="")
> write.table(res1, oFile, sep = "\t", quote = FALSE, col.names= FALSE,
> row.names = FALSE)
> aL1<-aL1+icount
> }
> dbDisconnect(con)
> ********code end***************
>
> However, the actual data file that I want to handle is about *160 GB*. And
> when I use the same above code on that file, it gives following error for
> dbWriteTable(con, ...) statement
> ********error start**************
> dbWriteTable(con, "TestDB", iFile, sep = "\t") #, eol = "\r\n")
> Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
> [1] FALSE
> ********error end**************
>
> I am not sure about the reason of this error. Is this due to the big file
> size? I understood from sqldf webpage that SQLite can work for even a larger
> file than this and is only restricted by the disc space and not RAM. I have
> about 400GB free space on the PC I am using, with Windows 7 as the operating
> system. I am assuming that the about dbWriteTable command is using the disc
> memory only and is not the issue.
>
> In fact this file has been created using MySQLdump and I do not have access
> to the original MYSQL database file.
> I want to know the following:
> (1)  Am I missing something in the above code that is preventing handling of
> this big 160  GB file?
> (2)  Should this be handled outside of R, if R is becoming a limitation in
> this? And if yes then what is a possible way forward?
>
> Thank you again for your quick response and all the help.
> HC
> http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt
>

I think its unlikely SQLite could handle a database that large unless
you can divide it into multiple separate databases.  At one time the
SQLite site said it did not handle databases over 1 GB and although I
think that is outdated by more recent versions of SQLite its still
likely true that your size is too large for it.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list