[R] Subsetting problem data, 2
Rui Barradas
ruipbarradas at sapo.pt
Thu Jul 19 22:27:02 CEST 2012
Hello,
Try the following. The data is your example of Patient A through E, but
from the output of dput().
dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A",
"B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle",
"V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
-18L))
dat
nms <- names(dat)[grep("^V[1-9]$", names(dat))]
dd <- split(dat, dat$Patient)
fun <- function(x) any(is.na(x)) && any(!is.na(x))
ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))
dd[ix]
do.call(rbind, dd[ix])
I'm assuming that the variables names are as posted, V followed by one
single digit 1-9. To keep the Patients with complete cases just negate
the index 'ix', it's a logical index.
Note also that dput() is the best way of posting a data example.
Hope this helps,
Rui Barradas
Em 19-07-2012 15:15, Lib Gray escreveu:
> Hello,
>
> I didn't give enough information when I sent an query before, so I'm trying
> again with a more detailed explanation:
>
> In this data set, each patient has a different number of measured variables
> (they represent tumors, so some people had 2 tumors, some had 5, etc). The
> problem I have is that often in later cycles for a patient, tumors that
> were originally measured are now missing (or a "new" tumor showed up). We
> assume there are many different reasons for why a tumor would be measured
> in one cycle and not another, and so I want to subset OUT the "problem"
> patients to better study these patterns.
>
> An example:
>
> Patient Cycle V1 V2 V3 V4 V5
> A 1 0.4 0.1 0.5 1.5 NA
> A 2 0.3 0.2 0.5 1.6 NA
> A 3 0.3 NA 0.6 1.7 NA
> A 4 0.4 NA 0.4 1.8 NA
> A 5 0.5 0.2 0.5 1.5 NA
>
> I want to keep patient A; they have 4 measured tumors, but tumor 2 is
> missing data for cycles 3 and 4
>
> B 1 0.4 NA NA NA NA
> B 2 0.4 NA NA NA NA
>
> I do not want to keep patient B; they have 1 tumor that is measure
> consistently in both cycles
>
> C 1 0.9 0.9 0.9 NA NA
> C 3 0.3 0.5 0.6 NA NA
> C 4 NA NA NA NA NA
> C 5 0.4 NA NA NA NA
>
> I do want to keep patient C; all their data is missing for cycle 4 and
> cycle 5 only measured one tumor
>
> D 1 0.2 0.5 NA NA NA
> D 2 0.5 0.7 NA NA NA
> D 4 0.6 0.4 NA NA NA
> D 5 0.5 0.5 NA NA NA
>
> I do not want patient D, their two tumors were measured each cycle
>
> E 1 0.1 NA NA NA NA
> E 2 0.5 0.3 NA NA NA
> E 3 0.4 0.3 NA NA NA
>
> I DO want patient E; they only had one tumor register in Cycle 1, but
> cycles 2 and 3 had two tumors.
>
>
> Thanks for any help!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list