[BioC] Sorting matrix by column
James W. MacDonald
jmacdon at uw.edu
Tue Oct 23 17:41:02 CEST 2012
What do you get from
class(x)
On 10/23/2012 11:38 AM, Kasoji, Manjula (NIH/NCI) [C] wrote:
> Hi Jim,
>
> The R session info below does correspond to the session I pasted. When I
> tried your suggestion, I still get an error:
>
>> x[base::order(x[,2]),]
> Error in base::order(x[, 2]) :
> unimplemented type 'list' in 'orderVector1'
>
>
> I see that you don't have quotes around the ID and Gene Symbol names in
> your matrix. Is there a way to remove the quotes?
>
> Thanks!
>
> On 10/23/12 11:27AM, "James W. MacDonald"<jmacdon at uw.edu> wrote:
>
>>
>> On 10/23/2012 11:15 AM, Guest [guest] wrote:
>>> Hi,
>>>
>>> I would like to sort a matrix by a specific column (column 2). I tried
>>> the order() function, but I get an error. I think it is because the
>>> values in column 2 are not numeric, they are gene symbols. This may be a
>>> general R question, but I thought I would post it here since it is
>>> microarray data analysis.
>>>
>>> I have matrix x:
>>>
>>>> x
>>> ID Gene Symbol logFC Adj.PVal
>>> 10344624 "10371400" "Lypla1" 0.3592492 0.9999522
>>> 10344633 "10453900" "Tcea1" 0.1886117 0.9999522
>>> 10344637 "10375051" "Atp6v1h" 0.6713107 0.9999522
>>> 10344653 "10575211" "Oprk1" -0.2342731 0.9999522
>>> 10344658 "10566254" "Rb1cc1" 1.790676 0.9999522
>>> 10344674 "10602372" "Fam150a" 1.397496 0.9999522
>>> 10344679 "10398428" "St18" -0.3278807 0.9999522
>>> 10344707 "10383518" "Pcmtd1" -0.2231074 0.9999522
>>> 10344713 "10397054" "Ahcy" -0.1844897 0.9999522
>>> 10344723 "10384020" "Rrs1" -0.2322781 0.9999522
>>> 10344725 "10608710" "Adhfe1" 0.5993566 0.9999522
>>> 10344741 "10363762" "Hnrnpa3" -0.2660978 0.9999522
>>> 10344743 "10375058" "3110035E14Rik" 0.9178868 0.9999522
>>> 10344750 "10381603" "Sgk3" -0.2961638 0.9999522
>>> 10344772 "10442373" "6030422M02Rik" -0.1653454 0.9999522
>>> 10344789 "10421227" "Cspp1" -0.1480766 0.9999522
>>> 10344799 "10534966" "Cspp1" -0.2436361 0.9999522
>>> 10344801 "10398408" "Cspp1" -0.4040665 0.9999522
>>> 10344803 "10398418" "Cspp1" -0.2556627 0.9999522
>>> 10344805 "10572772" "Cspp1" -0.1864641 0.9999522
>>>
>>> I want to sort on the "Gene Symbol" column so that I can remove the
>>> duplicates and keep the one with the highest log fold change.
>>>
>>> I tried the following and received an error.
>>>> x[order(x[,2]),]
>>> Error in order(x[, 2]) : unimplemented type 'list' in 'orderVector1'
>> I am not sure the sessionInfo() you give below corresponds to the
>> session above. I get:
>>
>>> x<- data.frame(ID = 12345:12354, Gene =
>> Rkeys(mogene10sttranscriptclusterSYMBOL)[5001:5010], logFC = rnorm(10),
>> pval = runif(10))
>>> x
>> ID Gene logFC pval
>> 1 12345 Sepw1 0.56914952 0.4916910
>> 2 12346 Serf1 0.83929962 0.4816986
>> 3 12347 Gm4748 0.12462117 0.9372249
>> 4 12348 Sez6 -0.21468480 0.4921201
>> 5 12349 Foxp3 -1.36283694 0.4575675
>> 6 12350 Sfpi1 1.03632565 0.5251826
>> 7 12351 Sfrp1 0.04689108 0.3068112
>> 8 12352 Frzb 0.08379607 0.1509499
>> 9 12353 Sfrp4 -1.61513620 0.9336235
>> 10 12354 Srsf2 1.56222316 0.2571122
>>> x[order(x[,2]),]
>> ID Gene logFC pval
>> 5 12349 Foxp3 -1.36283694 0.4575675
>> 8 12352 Frzb 0.08379607 0.1509499
>> 3 12347 Gm4748 0.12462117 0.9372249
>> 1 12345 Sepw1 0.56914952 0.4916910
>> 2 12346 Serf1 0.83929962 0.4816986
>> 4 12348 Sez6 -0.21468480 0.4921201
>> 6 12350 Sfpi1 1.03632565 0.5251826
>> 7 12351 Sfrp1 0.04689108 0.3068112
>> 9 12353 Sfrp4 -1.61513620 0.9336235
>> 10 12354 Srsf2 1.56222316 0.2571122
>>
>> It appears you have something loaded that thinks you want to use the
>> orderVector1() function. You can always specify the function you are
>> intending with the :: operator (in this case, you want base::order()).
>>
>>> x[base::order(x[,2]),]
>> ID Gene logFC pval
>> 5 12349 Foxp3 -1.36283694 0.4575675
>> 8 12352 Frzb 0.08379607 0.1509499
>> 3 12347 Gm4748 0.12462117 0.9372249
>> 1 12345 Sepw1 0.56914952 0.4916910
>> 2 12346 Serf1 0.83929962 0.4816986
>> 4 12348 Sez6 -0.21468480 0.4921201
>> 6 12350 Sfpi1 1.03632565 0.5251826
>> 7 12351 Sfrp1 0.04689108 0.3068112
>> 9 12353 Sfrp4 -1.61513620 0.9336235
>> 10 12354 Srsf2 1.56222316 0.2571122
>>
>> Best,
>>
>> Jim
>>
>>
>>> If anyone has any suggestions for an easy way to sort a significant
>>> gene list, remove duplicated values, and keep the value with highest
>>> fold change, that would be helpful!
>>>
>>> I've posted my session info below.
>>>
>>> Thanks!
>>>
>>> Guest
>>>
>>> -- output of sessionInfo():
>>>
>>>> sessionInfo()
>>> R version 2.15.1 (2012-06-22)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.15.1
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list