[R] ff package: reading selected columns from csv
threshold
r.kozarski at gmail.com
Wed Jul 25 17:48:50 CEST 2012
*Dear R users, Ive just started using the ff package.
There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
column from the file, skipping the first 100 rows.
Below Ive provided different outcomes, which will clarify my problem
*
> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
...
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] ff_2.2-7 bit_1.1-8
##---------------------------------------------------------------------------------------
## *I want to read the second column only:*
x.class <- c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')
##* The following command works fine:*
> read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
> colClasses=x.class, nrows=1e3)
ffdf (all open) dim=c(1000,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
PhysicalName VirtualVmode PhysicalVmode AsIs VirtualIsMatrix
V2 V2 double double FALSE FALSE
PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2 FALSE 1 1 1
PhysicalIsOpen
V2 TRUE
ffdf data
V2
1 -0.5412
2 -0.5842
3 -0.5920
4 -0.5451
5 -0.5099
6 -0.5021
7 -0.4943
8 -0.5490
: :
993 -0.4865
994 -0.6584
995 -0.7482
996 -0.8732
997 -0.8303
998 -0.7248
999 -0.5490
1000 -0.4240
*Then I extend nrows by 1, I get warning about number of columns:*
> read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
> colClasses=x.class, nrows=1001)
ffdf (all open) dim=c(1001,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
PhysicalName VirtualVmode PhysicalVmode AsIs VirtualIsMatrix
V2 V2 double double FALSE FALSE
PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2 FALSE 1 1 1
PhysicalIsOpen
V2 TRUE
ffdf data
V2
1 -0.5412
2 -0.5842
3 -0.5920
4 -0.5451
5 -0.5099
6 -0.5021
7 -0.4943
8 -0.5490
: :
994 -0.6584
995 -0.7482
996 -0.8732
997 -0.8303
998 -0.7248
999 -0.5490
1000 -0.4240
1001 -0.3849
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 7
>
*Then, going much beyond 1000 brings problems:*
> read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
> colClasses=x.class, nrows=1e4)
Error in read.table(file = file, header = header, sep = sep, quote = quote,
:
more columns than column names
*Question is why? The number of columns does not change in the file...
I will appreciate any help..
Best, Robert
*
--
View this message in context: http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list