[R] How to do the same thing for all levels of a column?
Bert Gunter
gunter.berton at gene.com
Tue Jul 24 18:37:44 CEST 2012
OK, I admit it: I re-read what you wrote and now I'm confused. Is:
> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
X1 X2 X3 X4 X5 X6 X7 X8
[1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
[2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
[3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
[4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2
what you want?
-- Bert
On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <bgunter at gene.com> wrote:
> The OP's request is a bit ambiguous to me: at a given residue, do you
> wish to calculate the proportions for only those amino acids that
> appear at that residue, or do you wish to include the proportions for
> all amino acids, some of which might then be 0.
>
> Assuming the former, then I don't think one needs to go to the lengths
> described by John below.
>
> Using your example (thanks!), the following seems to suffice:
>
>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
>
> $X1
> x
> L R T
> 0.50 0.25 0.25
>
> $X2
> x
> E M
> 0.75 0.25
>
> $X3
> x
> N Y
> 0.25 0.75
>
> $X4
> x
> I L Q
> 0.25 0.50 0.25
>
> $X5
> x
> I V
> 0.75 0.25
>
> $X6
> x
> P S
> 0.75 0.25
>
> $X7
> x
> D E G
> 0.25 0.50 0.25
>
> $X8
> x
> A C
> 0.75 0.25
>
>
> This could, of course, then be modified to add zero proportions for
> all non-appearing amino acids.
>
> -- Cheers,
> Bert
>
> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <jrkrideau at inbox.com> wrote:
>>
>> I think this does what you want using two packages, plyr and reshape2 that
>> you may have to install. If so install.packages("plyr", "reshape2") should
>> do the trick.
>> library(plyr)
>> library(reshape2)
>> # using supplied file 'myfile" from below
>> time0total = sum(myfile[,2])
>> mydata <- myfile[, 2:10]
>> md1 <- melt(mydata, id = "Time_zero")
>> ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)
>>
>>
>> John Kane
>> Kingston ON Canada
>>
>> -----Original Message-----
>> From: zj29 at cornell.edu
>> Sent: Tue, 24 Jul 2012 10:25:21 -0400
>> To: jrkrideau at inbox.com
>> Subject: Re: [R] How to do the same thing for all levels of a column?
>>
>> Hi John,
>> Thank you for the tips. My apologies about the unreadable sample data...
>> So here is the output of the sample data, and hopefully it works this time
>> :)
>> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 =
>> structure(c(1L,
>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 =
>> structure(c(1L,
>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names =
>> c("Proteins",
>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
>> c(NA,
>> 4L), class = "data.frame")
>> And here is my original question:
>> Basically, I have a bunch of protein sequences composed of different amino
>> acid residues, and each residue is represented by an uppercase letter. I
>> want to calculate the ratio of different amino acid residues at each
>> position of the proteins.
>>
>> If I name this table as myfile.txt, I have the following scripts to
>> calculate the ratio of each amino acid residue at position 1:
>>
>> # showing levels of the 3rd column, which means the types of residues
>>
>> >myfile[,3]
>>
>>
>> # calculating the ratio of L
>>
>> >list=c(which(myfile[,3]=="L"))
>>
>> >time0total=sum(myfile[,2])
>>
>> >AA_L=0
>>
>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>
>> >ratio_L=AA_L/time0total
>>
>>
>> So how can I write a script to do the same thing for the other two levels (T
>> and R) in column 3, and also do this for every column that contains amino
>> acid residues?
>>
>> Thanks a lot!
>>
>> Regards,
>>
>> Zhao
>> 2012/7/24 John Kane <[1]jrkrideau at inbox.com>
>>
>> First thing is to supply the data in a useable format. As is it is
>> essenatially unreadable. All R-beginners do this. :)
>> Have a look at the dput function (?dput) for a good way to supply sample
>> data in an email.
>> If you have a large dataset probably a few dozen lines of data would be
>> fine.
>> Something like dput(head(mydata)) should be fine. Just copy and paste the
>> output into your email.
>> Welcome to R. I think you will like it.
>> John Kane
>> Kingston ON Canada
>>
>> > -----Original Message-----
>> > From: [2]zj29 at cornell.edu
>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400
>> > To: [3]r-help at r-project.org
>> > Subject: [R] How to do the same thing for all levels of a column?
>> >
>> > Dear all,
>> >
>> >
>> >
>> > I am a R beginner, and I am looking for a way to do the same thing for
>> > all
>> > levels of a column in a table.
>> >
>> >
>> >
>> > Basically, I have a bunch of protein sequences composed of different
>> > amino
>> > acid residues, and each residue is represented by an uppercase letter. I
>> > want to calculate the ratio of different amino acid residues at each
>> > position of the proteins. Here is an example table:
>> >
>> > Proteins
>> >
>> > Time_zero
>> >
>> > 1
>> >
>> > 2
>> >
>> > 3
>> >
>> > 4
>> >
>> > 5
>> >
>> > 6
>> >
>> > 7
>> >
>> > 8
>> >
>> > p1
>> >
>> > 0.0050723
>> >
>> > L
>> >
>> > E
>> >
>> > Y
>> >
>> > I
>> >
>> > I
>> >
>> > P
>> >
>> > D
>> >
>> > A
>> >
>> > p2
>> >
>> > 0.0002731
>> >
>> > T
>> >
>> > E
>> >
>> > N
>> >
>> > L
>> >
>> > V
>> >
>> > P
>> >
>> > G
>> >
>> > A
>> >
>> > p3
>> >
>> > 9.757E-05
>> >
>> > L
>> >
>> > M
>> >
>> > Y
>> >
>> > Q
>> >
>> > I
>> >
>> > P
>> >
>> > E
>> >
>> > C
>> >
>> > p4
>> >
>> > 0.0002077
>> >
>> > R
>> >
>> > E
>> >
>> > Y
>> >
>> > L
>> >
>> > I
>> >
>> > S
>> >
>> > E
>> >
>> > A
>> >
>> >
>> >
>> > If I name this table as myfile.txt, I have the following scripts to
>> > calculate the ratio of each amino acid residue at position 1:
>> >
>> > # showing levels of the 3rd column, which means the types of residues
>> >
>> > >myfile[,3]
>> >
>> >
>> >
>> > # calculating the ratio of L
>> >
>> > >list=c(which(myfile[,3]=="L"))
>> >
>> > >time0total=sum(myfile[,2])
>> >
>> > >AA_L=0
>> >
>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>> >
>> > >ratio_L=AA_L/time0total
>> >
>> >
>> >
>> > So how can I write a script to do the same thing for the other two levels
>> > (T and R) in column 3, and also do this for every column that contains
>> > amino acid residues?
>> >
>> >
>> >
>> > Many thanks for any help you could give me on this topic! :)
>> >
>> >
>> >
>> > Regards,
>> >
>> > Zhao
>> > --
>> > Zhao JIN
>> > Ph.D. Candidate
>> > Ruth Ley Lab
>> > 467 Biotech
>> > Field of Microbiology, Cornell University
>> > Lab: 607.255.4954
>> > Cell: 412.889.3675
>> >
>>
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [4]R-help at r-project.org mailing list
>> > [5]https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > [6]http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> ____________________________________________________________
>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
>> your desktop!
>> Check it out at [7]http://www.inbox.com/marineaquarium
>>
>> --
>> Zhao JIN
>> Ph.D. Candidate
>> Ruth Ley Lab
>> 467 Biotech
>> Field of Microbiology, Cornell University
>> Lab: 607.255.4954
>> Cell: 412.889.3675
>> _________________________________________________________________
>>
>> [8]3D Earth Screensaver Preview
>> Free 3D Earth Screensaver
>> Watch the Earth right on your desktop! Check it out at
>> [9]www.inbox.com/earth
>>
>> References
>>
>> 1. mailto:jrkrideau at inbox.com
>> 2. mailto:zj29 at cornell.edu
>> 3. mailto:r-help at r-project.org
>> 4. mailto:R-help at r-project.org
>> 5. https://stat.ethz.ch/mailman/listinfo/r-help
>> 6. http://www.R-project.org/posting-guide.html
>> 7. http://www.inbox.com/marineaquarium
>> 8. http://www.inbox.com/earth
>> 9. http://www.inbox.com/earth
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list