[R] Sample rows in data frame by subsets
    Chris Stubben 
    stubben at lanl.gov
       
    Mon Jan 23 21:04:06 CET 2006
    
    
  
Hi,
I need to resample rows in a data frame by subsets
L3 <- LETTERS[1:3]
d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
    x  y fac
1  1  1   A
2  1  2   A
3  1  3   A
4  1  4   A
5  1  5   C
6  1  6   C
7  1  7   B
8  1  8   A
9  1  9   C
10 1 10   A
I have seen this used to sample rows with replacement
d[sample(nrow(d), replace=T), ]
     x  y fac
7   1  7   B
2   1  2   A
1   1  1   A
3   1  3   A
2.1 1  2   A
10  1 10   A
8   1  8   A
9   1  9   C
1.1 1  1   A
8.1 1  8   A
but I would like to sample based on the original number in fac
summary(d$fac)
A B C
6 1 3
rbind(subset(d, fac=="A")[sample(6, replace=T), ],
       subset(d, fac=="B")[sample(1, replace=T), ],
       subset(d, fac=="C")[sample(3, replace=T), ] )
     x  y fac
2   1  2   A
3   1  3   A
3.1 1  3   A
1   1  1   A
10  1 10   A
1.1 1  1   A
7   1  7   B
5   1  5   C
6   1  6   C
5.1 1  5   C
Is there an easy way to do this in one step or with a short function?  I 
have lots of dataframes to resample.
Thanks,
Chris
-- 
-----------------
Chris Stubben
Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545
    
    
More information about the R-help
mailing list