[R] How to do the same thing for all levels of a column?

John Kane jrkrideau at inbox.com
Tue Jul 24 17:18:45 CEST 2012


   I think this does what you want using two packages, plyr and reshape2 that
   you may have to install.  If so install.packages("plyr", "reshape2") should
   do the trick.
   library(plyr)
   library(reshape2)
   # using supplied file 'myfile" from below
   time0total = sum(myfile[,2])
   mydata  <-  myfile[, 2:10]
   md1  <-  melt(mydata, id = "Time_zero")
   ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)


   John Kane
   Kingston ON Canada

   -----Original Message-----
   From: zj29 at cornell.edu
   Sent: Tue, 24 Jul 2012 10:25:21 -0400
   To: jrkrideau at inbox.com
   Subject: Re: [R] How to do the same thing for all levels of a column?

   Hi John,
   Thank you for the tips. My apologies about the unreadable sample data...
   So here is the output of the sample data, and hopefully it works this time
   :)
   myfile  <-  structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
   "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
   9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
   "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
   ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
   1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
   2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"), X5 =
   structure(c(1L,
   2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
   1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
   3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"), X8 =
   structure(c(1L,
   1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")), .Names =
   c("Proteins",
   "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
   c(NA,
   4L), class = "data.frame")
   And here is my original question:
   Basically, I have a bunch of protein sequences composed of different amino
   acid residues, and each residue is represented by an uppercase letter. I
   want  to  calculate the ratio of different amino acid residues at each
   position of the proteins.

   If  I  name  this table as myfile.txt, I have the following scripts to
   calculate the ratio of each amino acid residue at position 1:

   # showing levels of the 3rd column, which means the types of residues

   >myfile[,3]


   # calculating the ratio of L

   >list=c(which(myfile[,3]=="L"))

   >time0total=sum(myfile[,2])

   >AA_L=0

   >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}

   >ratio_L=AA_L/time0total


   So how can I write a script to do the same thing for the other two levels (T
   and R) in column 3, and also do this for every column that contains amino
   acid residues?

   Thanks a lot!

   Regards,

   Zhao
   2012/7/24 John Kane <[1]jrkrideau at inbox.com>

     First thing is to supply the data in a useable format.  As is it is
     essenatially unreadable.  All R-beginners do this. :)
     Have a look at the dput function  (?dput) for a good way to supply sample
     data in an email.
     If you have a large dataset probably a few dozen lines of data would be
     fine.
     Something like dput(head(mydata)) should be fine.  Just copy and paste the
     output into your email.
     Welcome to R.  I think you will like it.
     John Kane
     Kingston ON Canada

   > -----Original Message-----
   > From: [2]zj29 at cornell.edu
   > Sent: Mon, 23 Jul 2012 18:01:11 -0400
   > To: [3]r-help at r-project.org
   > Subject: [R] How to do the same thing for all levels of a column?
   >
   > Dear all,
   >
   >
   >
   > I am a R beginner, and I am looking for a way to do the same thing for
   > all
   > levels of a column in a table.
   >
   >
   >
   > Basically, I have a bunch of protein sequences composed of different
   > amino
   > acid residues, and each residue is represented by an uppercase letter. I
   > want to calculate the ratio of different amino acid residues at each
   > position of the proteins. Here is an example table:
   >
   > Proteins
   >
   > Time_zero
   >
   > 1
   >
   > 2
   >
   > 3
   >
   > 4
   >
   > 5
   >
   > 6
   >
   > 7
   >
   > 8
   >
   > p1
   >
   > 0.0050723
   >
   > L
   >
   > E
   >
   > Y
   >
   > I
   >
   > I
   >
   > P
   >
   > D
   >
   > A
   >
   > p2
   >
   > 0.0002731
   >
   > T
   >
   > E
   >
   > N
   >
   > L
   >
   > V
   >
   > P
   >
   > G
   >
   > A
   >
   > p3
   >
   > 9.757E-05
   >
   > L
   >
   > M
   >
   > Y
   >
   > Q
   >
   > I
   >
   > P
   >
   > E
   >
   > C
   >
   > p4
   >
   > 0.0002077
   >
   > R
   >
   > E
   >
   > Y
   >
   > L
   >
   > I
   >
   > S
   >
   > E
   >
   > A
   >
   >
   >
   > If I name this table as myfile.txt, I have the following scripts to
   > calculate the ratio of each amino acid residue at position 1:
   >
   > # showing levels of the 3rd column, which means the types of residues
   >
   > >myfile[,3]
   >
   >
   >
   > # calculating the ratio of L
   >
   > >list=c(which(myfile[,3]=="L"))
   >
   > >time0total=sum(myfile[,2])
   >
   > >AA_L=0
   >
   > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
   >
   > >ratio_L=AA_L/time0total
   >
   >
   >
   > So how can I write a script to do the same thing for the other two levels
   > (T and R) in column 3, and also do this for every column that contains
   > amino acid residues?
   >
   >
   >
   > Many thanks for any help you could give me on this topic! :)
   >
   >
   >
   > Regards,
   >
   > Zhao
   > --
   > Zhao JIN
   > Ph.D. Candidate
   > Ruth Ley Lab
   > 467 Biotech
   > Field of Microbiology, Cornell University
   > Lab: 607.255.4954
   > Cell: 412.889.3675
   >

     >       [[alternative HTML version deleted]]
     >
     > ______________________________________________
     > [4]R-help at r-project.org mailing list
     > [5]https://stat.ethz.ch/mailman/listinfo/r-help
     > PLEASE do read the posting guide
     > [6]http://www.R-project.org/posting-guide.html
     > and provide commented, minimal, self-contained, reproducible code.
     ____________________________________________________________
     FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
     your desktop!
     Check it out at [7]http://www.inbox.com/marineaquarium

   --
   Zhao JIN
   Ph.D. Candidate
   Ruth Ley Lab
   467 Biotech
   Field of Microbiology, Cornell University
   Lab: 607.255.4954
   Cell: 412.889.3675
     _________________________________________________________________

   [8]3D Earth Screensaver Preview 
   Free 3D Earth Screensaver
   Watch   the   Earth   right   on   your   desktop!  Check  it  out  at
   [9]www.inbox.com/earth

References

   1. mailto:jrkrideau at inbox.com
   2. mailto:zj29 at cornell.edu
   3. mailto:r-help at r-project.org
   4. mailto:R-help at r-project.org
   5. https://stat.ethz.ch/mailman/listinfo/r-help
   6. http://www.R-project.org/posting-guide.html
   7. http://www.inbox.com/marineaquarium
   8. http://www.inbox.com/earth
   9. http://www.inbox.com/earth


More information about the R-help mailing list