[R] Efficient way to subset rows in R for dataset with 10^7 columns
Jack Arnestad
j@ck@rne@t@d @ending from gm@il@com
Sat Apr 14 02:31:32 CEST 2018
I have a data.table with dimensions 100 by 10^7.
When I do
trainIndex <-
caret::createDataPartition(
df$status,
p = .9,
list = FALSE,
times = 1
)
outerTrain <- df[trainIndex]
outerTest <- df[-trainIndex]
Subsetting the rows of df takes over 20 minutes.
What is the best way to efficiently subset this?
Thanks!
[[alternative HTML version deleted]]
More information about the R-help
mailing list