[R] summarize dataframe based on multiple cols, not their combinations
Ista Zahn
istazahn at gmail.com
Wed Mar 20 21:18:54 CET 2013
How about
library(reshape2)
mdf.m <- melt(my_df, measure.vars=c("a", "b", "c"))
mdf.m <- mdf.m[mdf.m$value > 0, ]
ddply(mdf.m, "variable", function(x) c("mean"=mean(x$dat), "n"=nrow(x)))
?
Best,
Ista
On Wed, Mar 20, 2013 at 3:57 PM, Alexander Shenkin <ashenkin at ufl.edu> wrote:
> Hi folks,
>
> I'm trying to figure out how to get summarized data based on multiple
> columns. However, instead of giving summaries for every combination of
> categorical columns, I want it for each value of each categorical column
> regardless of the other columns. I could do this with three different
> commands, but i'm wondering if there's a more elegant way that I'm
> missing. Thanks!
>
> allie
>
>> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
> c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
>
>> my_df
> a b c dat
> 1 1 0 1 10
> 2 1 0 0 11
> 3 1 0 1 12
> 4 0 1 0 13
> 5 0 1 1 14
> 6 0 1 0 15
>
>> # not what I want
>> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat), "n"=nrow(x)))
> a b c mean n
> 1 0 1 0 14 2
> 2 0 1 1 14 1
> 3 1 0 0 11 1
> 4 1 0 1 11 2
>
> What I want:
> a b c mean n
> 1 1 * * 11 3
> 2 * 1 * 14 3
> 3 * * 1 12 3
>
> where "*" refers to any value of the other columns.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list