[R] ff package: reading selected columns from csv
Jan van der Laan
rhelp at eoos.dds.nl
Thu Jul 26 09:58:14 CEST 2012
Having had a quick look at the source code for read.table.ffdf, I
suspect that using 'NULL' in the colClasses argument is not allowed.
Could you try to see if you can use read.table.ffdf with specifying
the colClasses for all columns (thereby reading in all columns in the
file)? If that works, you can be quite sure that indeed that number of
columns is constant in the file (sometimes a ' or unquoted , can mess
things up).
Jan
threshold <r.kozarski at gmail.com> schreef:
> *Dear R users, Ive just started using the ff package.
>
> There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
> column from the file, skipping the first 100 rows.
> Below Ive provided different outcomes, which will clarify my problem
> *
>> sessionInfo()
> R version 2.14.2 (2012-02-29)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> ...
>
> attached base packages:
> [1] tools stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] ff_2.2-7 bit_1.1-8
>
> ##---------------------------------------------------------------------------------------
> ## *I want to read the second column only:*
> x.class <- c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')
>
> ##* The following command works fine:*
>
>> read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
>> colClasses=x.class, nrows=1e3)
> ffdf (all open) dim=c(1000,1), dimorder=c(1,2) row.names=NULL
> ffdf virtual mapping
> PhysicalName VirtualVmode PhysicalVmode AsIs VirtualIsMatrix
> V2 V2 double double FALSE FALSE
> PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
> V2 FALSE 1 1 1
> PhysicalIsOpen
> V2 TRUE
> ffdf data
> V2
> 1 -0.5412
> 2 -0.5842
> 3 -0.5920
> 4 -0.5451
> 5 -0.5099
> 6 -0.5021
> 7 -0.4943
> 8 -0.5490
> : :
> 993 -0.4865
> 994 -0.6584
> 995 -0.7482
> 996 -0.8732
> 997 -0.8303
> 998 -0.7248
> 999 -0.5490
> 1000 -0.4240
>
> *Then I extend nrows by 1, I get warning about number of columns:*
>
>> read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
>> colClasses=x.class, nrows=1001)
> ffdf (all open) dim=c(1001,1), dimorder=c(1,2) row.names=NULL
> ffdf virtual mapping
> PhysicalName VirtualVmode PhysicalVmode AsIs VirtualIsMatrix
> V2 V2 double double FALSE FALSE
> PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
> V2 FALSE 1 1 1
> PhysicalIsOpen
> V2 TRUE
> ffdf data
> V2
> 1 -0.5412
> 2 -0.5842
> 3 -0.5920
> 4 -0.5451
> 5 -0.5099
> 6 -0.5021
> 7 -0.4943
> 8 -0.5490
> : :
> 994 -0.6584
> 995 -0.7482
> 996 -0.8732
> 997 -0.8303
> 998 -0.7248
> 999 -0.5490
> 1000 -0.4240
> 1001 -0.3849
> Warning message:
> In read.table(file = file, header = header, sep = sep, quote = quote, :
> cols = 1 != length(data) = 7
>>
>
> *Then, going much beyond 1000 brings problems:*
>> read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
>> colClasses=x.class, nrows=1e4)
> Error in read.table(file = file, header = header, sep = sep, quote = quote,
> :
> more columns than column names
>
> *Question is why? The number of columns does not change in the file...
>
> I will appreciate any help..
>
>
> Best, Robert
>
> *
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list