[R] finding birth position
Deepankar Basu
basu.15 at osu.edu
Fri Oct 26 17:16:41 CEST 2007
Thanks a lot for all the comments and suggestions. It has helped me
solve the problem. I find the "wide" to "long" transformation of the
data especially helpful. I used this in STATA but was not aware that I
could do the same in R.
Deepankar
On Fri, 2007-10-26 at 08:44 -0500, Douglas Bates wrote:
> Another approach is to convert the data frame that you have in what is
> sometimes called the "wide" format to the "long" format. See ?reshape
> for details on this transformation.
>
> In the process of doing the conversion I would also convert the sex of
> the child to a factor with meaningful levels and the family number to
> a factor.
>
> > birth # data in the original, "wide" format
> b1 b2 b3 b4 b5 b6
> 1 1 2 1 2 NA NA
> 2 2 2 NA NA NA NA
> 3 1 2 1 1 1 NA
> 4 2 1 NA NA NA NA
> 5 1 NA NA NA NA NA
> 6 2 1 2 1 NA NA
> > bl <- reshape(birth, varying = list(1:6),
> v.names = "sex", timevar = "ord",
> idvar = "family", direction = "long")
> > head(bl, n = 8) # a data frame with 3 columns
> ord sex family
> 1.1 1 1 1
> 2.1 1 2 2
> 3.1 1 1 3
> 4.1 1 2 4
> 5.1 1 1 5
> 6.1 1 2 6
> 1.2 2 2 1
> 2.2 2 2 2
> > bl$sex <- factor(bl$sex, labels = c("M", "F")) # use a factor with meaningful labels
> > bl <- subset(bl, !is.na(sex)) # remove records of births that did not occur
> > bl$family <- factor(bl$family) # convert family to a factor
> > str(bl) # resulting structure has only 18 rows
> 'data.frame': 18 obs. of 3 variables:
> $ ord : int 1 1 1 1 1 1 2 2 2 2 ...
> $ sex : Factor w/ 2 levels "M","F": 1 2 1 2 1 2 2 2 2 1 ...
> $ family: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
> > bl
> ord sex family
> 1.1 1 M 1
> 2.1 1 F 2
> 3.1 1 M 3
> 4.1 1 F 4
> 5.1 1 M 5
> 6.1 1 F 6
> 1.2 2 F 1
> 2.2 2 F 2
> 3.2 2 F 3
> 4.2 2 M 4
> 6.2 2 M 6
> 1.3 3 M 1
> 3.3 3 M 3
> 6.3 3 F 6
> 1.4 4 F 1
> 3.4 4 M 3
> 6.4 4 M 6
> 3.5 5 M 3
> > subset(bl, sex == "M") # these are the births of males only
> ord sex family
> 1.1 1 M 1
> 3.1 1 M 3
> 5.1 1 M 5
> 4.2 2 M 4
> 6.2 2 M 6
> 1.3 3 M 1
> 3.3 3 M 3
> 3.4 4 M 3
> 6.4 4 M 6
> 3.5 5 M 3
> > with(subset(bl, sex == "M"), tapply(ord, family, min)) # first male birth in family
> 1 2 3 4 5 6
> 1 NA 1 2 1 2
>
> The wide format may seem a natural representation for such data but
> frequently it is inefficient and awkward. The long format is much
> easier to manipulate in R.
>
> On 10/25/07, jim holtman <jholtman at gmail.com> wrote:
> > You might want to consider another representation, but it would depend
> > on how you want to use it. Here is a 'list' that records for each row
> > the position of the boys; does this start to give you the type of data
> > that you want? These are the numeric values of where the boys occur.
> >
> > > x.m
> > b1 b2 b3 b4 b5 b6
> > [1,] 1 2 1 2 NA NA
> > [2,] 2 2 NA NA NA NA
> > [3,] 1 2 1 1 1 NA
> > [4,] 2 1 NA NA NA NA
> > [5,] 1 NA NA NA NA NA
> > [6,] 2 1 2 1 NA NA
> > > apply(x.m, 1, function(a)which(a == 1))
> > [[1]]
> > b1 b3
> > 1 3
> >
> > [[2]]
> > named integer(0)
> >
> > [[3]]
> > b1 b3 b4 b5
> > 1 3 4 5
> >
> > [[4]]
> > b2
> > 2
> >
> > [[5]]
> > b1
> > 1
> >
> > [[6]]
> > b2 b4
> > 2 4
> >
> > >
> >
> >
> > On 10/25/07, Deepankar Basu <basu.15 at osu.edu> wrote:
> > > Hi All,
> > >
> > > I have data on the sequence of births for families with completed
> > > fertility cycle (in a data frame); the relevant variables are called b1,
> > > b2, b3, b4, b5, b6 and record the birth of the first, second, ..., sixth
> > > child. So,
> > > b1=1 if the first birth is male,
> > > b1=2 if the first birth is female,
> > > and b1=NA if the family did not record any first birth.
> > >
> > > Similarly for b2, b3, b4, b5 and b6.
> > >
> > > I want to record the positions of the male children within their
> > > family's birth history. So, I was thinking of creating six variables
> > > boy_1, boy_2, ..., boy_6. boy_1 would record the position of the first
> > > boy, boy_2 would record the position of the second boy and so on till
> > > boy_6. I want to assign a value of zero to boy_i if the family in
> > > question did not have the i_th boy.
> > >
> > > I am not sure how best to do this (i.e., whether to create variables as
> > > I have suggested or do something else) and would appreciate any
> > > suggestions. Later, I want to use the information on the position of the
> > > male births to compute a likelihood function and do an MLE.
> > >
> > > Here is how my data frame would look:
> > >
> > > b1 b2 b3 b4 b5 b6
> > > 1 2 1 2 NA NA
> > > 2 2 NA NA NA NA
> > > 1 2 1 1 1 NA
> > > 2 1 NA NA NA NA
> > > 1 NA NA NA NA NA
> > > 2 1 2 1 NA NA
> > >
> > > Thanks in advance.
> > >
> > > Deepankar
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem you are trying to solve?
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
More information about the R-help
mailing list