[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?

Dave Hewitt dhewitt at vims.edu
Thu Jan 24 05:31:15 CET 2008


I need to do a somewhat complex rearrangement of an existing dataframe that 
involves an "on-the-fly" conditional summary as well. I've tried to make 
the example as simple as possible. Code to produce the object being acted 
on is at the end of the message. I hope this isn't too much for a single 
posting. Any guidance will be much appreciated.

The input example looks like this:

survey	year	count	location	var1	var2
1	1	2	A1	21.2	1
2	1	0	A1	15.6	1
3	1	29	A1	12.1	1
1	1	11	B3	9	2
3	1	5	B3	4.8	2
1	2	7	A2	20.1	4
2	2	2	A2	19.2	4

I need to rearrange it to look like this (the order of the columns is not 
important):

location	year	survey1	survey2	survey3	var1summ	var2
A1	1	1	0	2	16.3	1
B3	1	2	NA	1	6.9	2
A2	2	1	1	NA	19.7	4

The operations/conditions that I need are:

(A) The observations in "count" for each survey at each location in a given 
year need to be reduced according to the following rules and then placed in 
new columns "survey1", "survey2", "survey3" :
-- when  count = 0,  surveyX = 0
-- when  0 < count < 11,  surveyX = 1
-- when  11 < count,  surveyX = 2

(B) The observations in var1 need to be summarized (here, by taking the 
mean) across all surveys for a given location in a given year and placed in 
"var1summ"

(C) Observations in var2 need to be summarized in a similar way, except 
that they don't change across surveys for a given location in a given year 
(so mean, median, etc. would work, but a more general solution would be 
nice for variables that are categorical)

(D) Surveys (rows) can be missing for a given location in a given year. I'd 
like to retain a column for each survey in the new dataframe and denote 
surveys that were missing as NA. E.g., survey2 is missing for location B3 
in year 1.


** Object creation:
x <- structure(list(survey = c(1L, 2L, 3L, 1L, 3L, 1L, 2L), year = c(1L, 
1L, 1L, 1L, 1L, 2L, 2L), count = c(2L, 0L, 29L, 11L, 5L, 7L, 2L), location 
= structure(c(1L, 1L, 1L, 3L, 3L, 2L, 2L), .Label = c("A1", "A2", "B3"), 
class = "factor"), var1 = c(21.2, 15.6, 12.1, 9, 4.8, 20.1, 19.2), var2 = 
c(1L, 1L, 1L, 2L, 2L, 4L, 4L)), .Names = c("survey", "year", "count", 
"location", "var1", "var2"), class = "data.frame", row.names = c(NA, -7L))

Thanks,
Dave Hewitt
Fisheries Science
Virginia Institute of Marine Science, Gloucester Point, VA, USA



More information about the R-help mailing list