[R] How to do the same thing for all levels of a column?
John Kane
jrkrideau at inbox.com
Wed Jul 25 15:44:58 CEST 2012
No it's actually telling it to split by the two variables (variable, value)
if I understand your question correctly.
The confusion is my fault. I tend to be lazy when running examples and did
not rename the melt() output to something meaningful. I sometimes forget
that it's not just me reading the code.
If you run:
md1 <- melt(mydata, id = "Time_zero",
variable.name="xvars",
value.name="aminos")
ddply(md1, .(xvars, aminos), summarise, sum = sum(Time_zero)/time0total)
I think it will show what is happening.
John Kane
Kingston ON Canada
-----Original Message-----
From: zj29 at cornell.edu
Sent: Tue, 24 Jul 2012 15:26:52 -0400
To: gunter.berton at gene.com
Subject: Re: [R] How to do the same thing for all levels of a column?
Hi John and Bert,
Thank you so much for your replies. Both of your scripts worked well, so now
I've learnt two ways to do it. :)
Bert: I was not very clear on what I wanted to do. I just would like to
calculate the residues shown in the table, not all residues. The apply
functions are amazing!
John: as I am still digesting the codes, I am not sure if I fully understood
the argument .(variables, value) in the ddply line. The description of ddply
says that .variables show the variables to split data frame by, as quoted
variables, a formula or character vector. So does .(variables, value) tell R
to split the data frame by values, which are the types of amino acid
residues?
Thank you all again.
Cheers,
Zhao
2012/7/24 Bert Gunter <[1]gunter.berton at gene.com>
... and I neglected to mention that f = myfiles[,2]
Sigh.... More coffee needed.
-- Bert
On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[2]bgunter at gene.com> wrote:
> Sorry. Typo in my previous. Should be:
>
>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum)))
> $X1
> L R T
> 0.91491320 0.03675651 0.04833030
>
> $X2
> E M
> 0.9827278 0.0172722
>
> $X3
> N Y
> 0.0483303 0.9516697
>
> $X4
> I L Q
> 0.8976410 0.0850868 0.0172722
>
> $X5
> I V
> 0.9516697 0.0483303
>
> $X6
> P S
> 0.96324349 0.03675651
>
> $X7
> D E G
> 0.8976410 0.0540287 0.0483303
>
> $X8
> A C
> 0.9827278 0.0172722
>
>
>
> On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[3]bgunter at gene.com> wrote:
>> OK, I admit it: I re-read what you wrote and now I'm confused. Is:
>>
>>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
>>
>> X1 X2 X3 X4 X5 X6 X7 X8
>> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
>> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
>> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
>> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2
>>
>> what you want?
>>
>> -- Bert
>> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[4]bgunter at gene.com> wrote:
>>> The OP's request is a bit ambiguous to me: at a given residue, do you
>>> wish to calculate the proportions for only those amino acids that
>>> appear at that residue, or do you wish to include the proportions for
>>> all amino acids, some of which might then be 0.
>>>
>>> Assuming the former, then I don't think one needs to go to the lengths
>>> described by John below.
>>>
>>> Using your example (thanks!), the following seems to suffice:
>>>
>>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
>>>
>>> $X1
>>> x
>>> L R T
>>> 0.50 0.25 0.25
>>>
>>> $X2
>>> x
>>> E M
>>> 0.75 0.25
>>>
>>> $X3
>>> x
>>> N Y
>>> 0.25 0.75
>>>
>>> $X4
>>> x
>>> I L Q
>>> 0.25 0.50 0.25
>>>
>>> $X5
>>> x
>>> I V
>>> 0.75 0.25
>>>
>>> $X6
>>> x
>>> P S
>>> 0.75 0.25
>>>
>>> $X7
>>> x
>>> D E G
>>> 0.25 0.50 0.25
>>>
>>> $X8
>>> x
>>> A C
>>> 0.75 0.25
>>>
>>>
>>> This could, of course, then be modified to add zero proportions for
>>> all non-appearing amino acids.
>>>
>>> -- Cheers,
>>> Bert
>>>
>>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[5]jrkrideau at inbox.com>
wrote:
>>>>
>>>> I think this does what you want using two packages, plyr and
reshape2 that
>>>> you may have to install. If so install.packages("plyr", "reshape2")
should
>>>> do the trick.
>>>> library(plyr)
>>>> library(reshape2)
>>>> # using supplied file 'myfile" from below
>>>> time0total = sum(myfile[,2])
>>>> mydata <- myfile[, 2:10]
>>>> md1 <- melt(mydata, id = "Time_zero")
>>>> ddply(md1, .(variable, value), summarise, sum =
sum(Time_zero)/time0total)
>>>>
>>>>
>>>> John Kane
>>>> Kingston ON Canada
>>>>
>>>> -----Original Message-----
>>>> From: [6]zj29 at cornell.edu
>>>> Sent: Tue, 24 Jul 2012 10:25:21 -0400
>>>> To: [7]jrkrideau at inbox.com
>>>> Subject: Re: [R] How to do the same thing for all levels of a
column?
>>>>
>>>> Hi John,
>>>> Thank you for the tips. My apologies about the unreadable sample
data...
>>>> So here is the output of the sample data, and hopefully it works
this time
>>>> :)
>>>> myfile <- structure(list(Proteins = structure(1:4, .Label =
c("p1", "p2",
>>>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
>>>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label =
c("L",
>>>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
>>>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
>>>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 =
structure(c(1L,
>>>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5
=
>>>> structure(c(1L,
>>>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 =
structure(c(1L,
>>>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 =
structure(c(1L,
>>>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8
=
>>>> structure(c(1L,
>>>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names
=
>>>> c("Proteins",
>>>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"),
row.names =
>>>> c(NA,
>>>> 4L), class = "data.frame")
>>>> And here is my original question:
>>>> Basically, I have a bunch of protein sequences composed of different
amino
>>>> acid residues, and each residue is represented by an uppercase
letter. I
>>>> want to calculate the ratio of different amino acid residues at
each
>>>> position of the proteins.
>>>>
>>>> If I name this table as myfile.txt, I have the following scripts
to
>>>> calculate the ratio of each amino acid residue at position 1:
>>>>
>>>> # showing levels of the 3rd column, which means the types of
residues
>>>>
>>>> >myfile[,3]
>>>>
>>>>
>>>> # calculating the ratio of L
>>>>
>>>> >list=c(which(myfile[,3]=="L"))
>>>>
>>>> >time0total=sum(myfile[,2])
>>>>
>>>> >AA_L=0
>>>>
>>>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>>>
>>>> >ratio_L=AA_L/time0total
>>>>
>>>>
>>>> So how can I write a script to do the same thing for the other two
levels (T
>>>> and R) in column 3, and also do this for every column that contains
amino
>>>> acid residues?
>>>>
>>>> Thanks a lot!
>>>>
>>>> Regards,
>>>>
>>>> Zhao
>>>> 2012/7/24 John Kane <[1][8]jrkrideau at inbox.com>
>>>>
>>>> First thing is to supply the data in a useable format. As is it
is
>>>> essenatially unreadable. All R-beginners do this. :)
>>>> Have a look at the dput function (?dput) for a good way to supply
sample
>>>> data in an email.
>>>> If you have a large dataset probably a few dozen lines of data
would be
>>>> fine.
>>>> Something like dput(head(mydata)) should be fine. Just copy and
paste the
>>>> output into your email.
>>>> Welcome to R. I think you will like it.
>>>> John Kane
>>>> Kingston ON Canada
>>>>
>>>> > -----Original Message-----
>>>> > From: [2][9]zj29 at cornell.edu
>>>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400
>>>> > To: [3][10]r-help at r-project.org
>>>> > Subject: [R] How to do the same thing for all levels of a column?
>>>> >
>>>> > Dear all,
>>>> >
>>>> >
>>>> >
>>>> > I am a R beginner, and I am looking for a way to do the same thing
for
>>>> > all
>>>> > levels of a column in a table.
>>>> >
>>>> >
>>>> >
>>>> > Basically, I have a bunch of protein sequences composed of
different
>>>> > amino
>>>> > acid residues, and each residue is represented by an uppercase
letter. I
>>>> > want to calculate the ratio of different amino acid residues at
each
>>>> > position of the proteins. Here is an example table:
>>>> >
>>>> > Proteins
>>>> >
>>>> > Time_zero
>>>> >
>>>> > 1
>>>> >
>>>> > 2
>>>> >
>>>> > 3
>>>> >
>>>> > 4
>>>> >
>>>> > 5
>>>> >
>>>> > 6
>>>> >
>>>> > 7
>>>> >
>>>> > 8
>>>> >
>>>> > p1
>>>> >
>>>> > 0.0050723
>>>> >
>>>> > L
>>>> >
>>>> > E
>>>> >
>>>> > Y
>>>> >
>>>> > I
>>>> >
>>>> > I
>>>> >
>>>> > P
>>>> >
>>>> > D
>>>> >
>>>> > A
>>>> >
>>>> > p2
>>>> >
>>>> > 0.0002731
>>>> >
>>>> > T
>>>> >
>>>> > E
>>>> >
>>>> > N
>>>> >
>>>> > L
>>>> >
>>>> > V
>>>> >
>>>> > P
>>>> >
>>>> > G
>>>> >
>>>> > A
>>>> >
>>>> > p3
>>>> >
>>>> > 9.757E-05
>>>> >
>>>> > L
>>>> >
>>>> > M
>>>> >
>>>> > Y
>>>> >
>>>> > Q
>>>> >
>>>> > I
>>>> >
>>>> > P
>>>> >
>>>> > E
>>>> >
>>>> > C
>>>> >
>>>> > p4
>>>> >
>>>> > 0.0002077
>>>> >
>>>> > R
>>>> >
>>>> > E
>>>> >
>>>> > Y
>>>> >
>>>> > L
>>>> >
>>>> > I
>>>> >
>>>> > S
>>>> >
>>>> > E
>>>> >
>>>> > A
>>>> >
>>>> >
>>>> >
>>>> > If I name this table as myfile.txt, I have the following scripts
to
>>>> > calculate the ratio of each amino acid residue at position 1:
>>>> >
>>>> > # showing levels of the 3rd column, which means the types of
residues
>>>> >
>>>> > >myfile[,3]
>>>> >
>>>> >
>>>> >
>>>> > # calculating the ratio of L
>>>> >
>>>> > >list=c(which(myfile[,3]=="L"))
>>>> >
>>>> > >time0total=sum(myfile[,2])
>>>> >
>>>> > >AA_L=0
>>>> >
>>>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>>> >
>>>> > >ratio_L=AA_L/time0total
>>>> >
>>>> >
>>>> >
>>>> > So how can I write a script to do the same thing for the other two
levels
>>>> > (T and R) in column 3, and also do this for every column that
contains
>>>> > amino acid residues?
>>>> >
>>>> >
>>>> >
>>>> > Many thanks for any help you could give me on this topic! :)
>>>> >
>>>> >
>>>> >
>>>> > Regards,
>>>> >
>>>> > Zhao
>>>> > --
>>>> > Zhao JIN
>>>> > Ph.D. Candidate
>>>> > Ruth Ley Lab
>>>> > 467 Biotech
>>>> > Field of Microbiology, Cornell University
>>>> > Lab: 607.255.4954
>>>> > Cell: 412.889.3675
>>>> >
>>>>
>>>> > [[alternative HTML version deleted]]
>>>> >
>>>> > ______________________________________________
>>>> > [4][11]R-help at r-project.org mailing list
>>>> > [5][12]https://stat.ethz.ch/mailman/listinfo/r-help
>>>> > PLEASE do read the posting guide
>>>> > [6][13]http://www.R-project.org/posting-guide.html
>>>> > and provide commented, minimal, self-contained, reproducible
code.
>>>> ____________________________________________________________
>>>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks &
orcas on
>>>> your desktop!
>>>> Check it out at [7][14]http://www.inbox.com/marineaquarium
>>>>
>>>> --
>>>> Zhao JIN
>>>> Ph.D. Candidate
>>>> Ruth Ley Lab
>>>> 467 Biotech
>>>> Field of Microbiology, Cornell University
>>>> Lab: 607.255.4954
>>>> Cell: 412.889.3675
>>>> _________________________________________________________________
>>>>
>>>> [8]3D Earth Screensaver Preview
>>>> Free 3D Earth Screensaver
>>>> Watch the Earth right on your desktop! Check it out
at
>>>> [9][15]www.inbox.com/earth
>>>>
>>>> References
>>>>
>>>> 1. mailto:[16]jrkrideau at inbox.com
>>>> 2. mailto:[17]zj29 at cornell.edu
>>>> 3. mailto:[18]r-help at r-project.org
>>>> 4. mailto:[19]R-help at r-project.org
>>>> 5. [20]https://stat.ethz.ch/mailman/listinfo/r-help
>>>> 6. [21]http://www.R-project.org/posting-guide.html
>>>> 7. [22]http://www.inbox.com/marineaquarium
>>>> 8. [23]http://www.inbox.com/earth
>>>> 9. [24]http://www.inbox.com/earth
>>>> ______________________________________________
>>>> [25]R-help at r-project.org mailing list
>>>> [26]https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
[27]http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>>
[28]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
iostatistics/pdb-ncb-home.htm
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
[29]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
iostatistics/pdb-ncb-home.htm
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
[30]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
iostatistics/pdb-ncb-home.htm
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
[31]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
iostatistics/pdb-ncb-home.htm
--
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675
_________________________________________________________________
[32]3D Marine Aquarium Screensaver Preview
Free 3D Marine Aquarium Screensaver
Watch dolphins, sharks & orcas on your desktop! Check it out at
[33]www.inbox.com/marineaquarium
References
1. mailto:gunter.berton at gene.com
2. mailto:bgunter at gene.com
3. mailto:bgunter at gene.com
4. mailto:bgunter at gene.com
5. mailto:jrkrideau at inbox.com
6. mailto:zj29 at cornell.edu
7. mailto:jrkrideau at inbox.com
8. mailto:jrkrideau at inbox.com
9. mailto:zj29 at cornell.edu
10. mailto:r-help at r-project.org
11. mailto:R-help at r-project.org
12. https://stat.ethz.ch/mailman/listinfo/r-help
13. http://www.R-project.org/posting-guide.html
14. http://www.inbox.com/marineaquarium
15. http://www.inbox.com/earth
16. mailto:jrkrideau at inbox.com
17. mailto:zj29 at cornell.edu
18. mailto:r-help at r-project.org
19. mailto:R-help at r-project.org
20. https://stat.ethz.ch/mailman/listinfo/r-help
21. http://www.R-project.org/posting-guide.html
22. http://www.inbox.com/marineaquarium
23. http://www.inbox.com/earth
24. http://www.inbox.com/earth
25. mailto:R-help at r-project.org
26. https://stat.ethz.ch/mailman/listinfo/r-help
27. http://www.R-project.org/posting-guide.html
28. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
29. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
30. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
31. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
32. http://www.inbox.com/marineaquarium
33. http://www.inbox.com/marineaquarium
More information about the R-help
mailing list