[R] the "union" of several data frame rows
Scot W. McNary
smcnary at charm.net
Fri Feb 8 23:23:06 CET 2008
Hi,
Thanks to Henrique Dallazuanna, Erik Iverson, Mark Leeds, and J. Scott
Olson for pointing me down the path of joy. I finally figured out a
solution to the problem:
Given the following list of partially overlapping test keys, a data
frame called keys1:
ID X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
X14 X15
A KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA>
<NA> <NA>
B KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA>
<NA> <NA>
C KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA>
<NA> <NA>
D KEY D C D A B D D D A D D D
A C C
E KEY D C D A B D D D A D D D
A C C
F KEY D C D <NA> B D <NA> <NA> <NA> D <NA> <NA> <NA>
<NA> <NA>
G KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA>
<NA> <NA>
H KEY D C D A B D D D A D D D
A C C
I KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA>
<NA> <NA>
J KEY D C D A B <NA> <NA> <NA> <NA> <NA> D D
A C C
K KEY D C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
<NA> <NA>
L KEY D C D <NA> B D <NA> <NA> <NA> D <NA> <NA> <NA>
<NA> <NA>
M KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA>
<NA> <NA>
N KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA>
<NA> <NA>
The goal was to wind up with a common test key:
Common Key D C D A B D D D A D D D A C C
What worked was the following:
ck <- for (i in 1:dim(keys1)[1]) {keys1[1, is.na(keys1[1,])] <-
keys1[i+1, is.na(keys1[1,])]}
I neglected to mention in my first example that there were <NA>
observations, which may have affected the kinds of solutions that were
suggested. Chalk up another testimonial in favor providing a small
workable examples when asking for help.
Thanks very much,
Scot
Henrique Dallazuanna wrote:
> Perhaps:
>
> data <- data.frame(key, row.names=1)
> names(data) <- paste("q", 1:6, sep="")
> apply(data, 2, function(x)unique(x)[unique(x) != " "])
>
>
> On 01/02/2008, Scot W. McNary <smcnary at charm.net> wrote:
>
>> Hi,
>>
>> I have a question about how to obtain the union of several data frame
>> rows. I'm trying to create a common key for several tests composed of
>> different items. Here is a small scale version of the problem. These
>> are keys for 4 different tests, not all mutually exclusive:
>>
>> id q1 q2 q3 q4 q5 q6
>> 1 A C
>> 2 B D
>> 3 A D B
>> 4 C D B D
>>
>> I would like to create a single key all test versions, the "union" of
>> the above:
>>
>> id q1 q2 q3 q4 q5 q6
>> key A C D B B D
>>
>>
>> Here is what I have (unsuccessfully) tried so far:
>>
>> > key <-
>> + matrix(c("1", "A", "C", " ", " ", " ", " ",
>> + "2", " ", " ", " ", " ", "B", "D",
>> + "3", "A", " ", "D", "B", " ", " ",
>> + "4", " ", "C", "D", " ", "B", "D"),
>> + byrow=TRUE, ncol = 7)
>> >
>> > k1 <- key[1, 2:7]
>> > k2 <- key[2, 2:7]
>> > k3 <- key[3, 2:7]
>> > k4 <- key[4, 2:7]
>> >
>> > itemid <- c("q1", "q2", "q3", "q4", "q5", "q6")
>> >
>> > k1 <- cbind(itemid, k1)
>> > k2 <- cbind(itemid, k2)
>> > k3 <- cbind(itemid, k3)
>> > k4 <- cbind(itemid, k4)
>> >
>> > tmp <- merge(k1, k2, by = "itemid")
>> > tmp <- merge(tmp, k3, by = "itemid")
>> > tmp <- merge(tmp, k4, by = "itemid")
>> >
>> > t(tmp)
>> [,1] [,2] [,3] [,4] [,5] [,6]
>> itemid "q1" "q2" "q3" "q4" "q5" "q6"
>> k1 "A" "C" " " " " " " " "
>> k2 " " " " " " " " "B" "D"
>> k3 "A" " " "D" "B" " " " "
>> k4 " " "C" "D" " " "B" "D"
>>
>> The actual problem involves 300 or so items instead of 6 and 10
>> different keys instead of four. Any suggestions welcome.
>>
>> Thanks in advance,
>>
>> Scot McNary
>>
>> > version
>> _
>> platform i386-pc-mingw32
>> arch i386
>> os mingw32
>> system i386, mingw32
>> status
>> major 2
>> minor 6.1
>> year 2007
>> month 11
>> day 26
>> svn rev 43537
>> language R
>> version.string R version 2.6.1 (2007-11-26)
>>
>>
>> --
>> Scot McNary
>> smcnary at charm dot net
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
--
Scot McNary
smcnary at charm dot net
More information about the R-help
mailing list