[R] How should I organize data to compare differences in matchedpairs?

Greg Snow Greg.Snow at imail.org
Thu Jan 24 20:05:41 CET 2008


Here is how I would do it (there are multiple ways you could do it, so
there is not single "Right" answer):

Assign each person a unique identifier.

Put all the information from the questionaire along with the idenifier
and anything else that does not change between rounds (age, sex, height,
...) into one data frame.  This df will have as many rows as you have
subjects.

The round information then goes into a second data frame with each round
being a row (each subject has multiple rows) and include the unique
identifier on each row for that person.

If you need information combined from both data frames, then use the
merge function to merge the 2 data frames (or subsets of them) together.

Advantages of this method include:

Uses data frames which most of the analysis functions expect.
Each piece of data is only entered once (other than the id)

Disadvantage:

Data is split between 2 objects.


Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111
 
 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Thomas Levine
> Sent: Thursday, January 24, 2008 11:43 AM
> To: r-help at r-project.org
> Subject: [R] How should I organize data to compare 
> differences in matchedpairs?
> 
> I'm just learning how to use R right now, so I'm not sure 
> what the most efficient way to organize these data is.
> 
> I had subjects perform the same task twice with slight 
> changes between the rounds. I want to analyze differences 
> between the rounds. All of the subjects also answered a questionnaire.
> 
> Putting all of one subject's information on one row seems sloppy.
> 
> I was thinking about making a three-dimensional array with 
> subject number, round and measurement as axes, but then the 
> differences would have to be the third column in the round 
> axis, which also seemed messy. Also, I would have duplicates 
> of all of the information from the questionnaire, which seems 
> inefficient.
> 
> Or maybe I could just use a matrix where round is just 
> another column among all of the measurements. This is similar 
> to the previous arrangement, but I don't know which is 
> better. It still has all of the duplicated information that 
> the previous method has.
> 
> Anyway, I'm sure someone's done this before, so I'd like to 
> see what other people have done for data like these.
> 
> Thomas Levine
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list