[BioC] Sorting matrix by column
James W. MacDonald
jmacdon at uw.edu
Tue Oct 23 17:27:41 CEST 2012
On 10/23/2012 11:15 AM, Guest [guest] wrote:
> Hi,
>
> I would like to sort a matrix by a specific column (column 2). I tried the order() function, but I get an error. I think it is because the values in column 2 are not numeric, they are gene symbols. This may be a general R question, but I thought I would post it here since it is microarray data analysis.
>
> I have matrix x:
>
>> x
> ID Gene Symbol logFC Adj.PVal
> 10344624 "10371400" "Lypla1" 0.3592492 0.9999522
> 10344633 "10453900" "Tcea1" 0.1886117 0.9999522
> 10344637 "10375051" "Atp6v1h" 0.6713107 0.9999522
> 10344653 "10575211" "Oprk1" -0.2342731 0.9999522
> 10344658 "10566254" "Rb1cc1" 1.790676 0.9999522
> 10344674 "10602372" "Fam150a" 1.397496 0.9999522
> 10344679 "10398428" "St18" -0.3278807 0.9999522
> 10344707 "10383518" "Pcmtd1" -0.2231074 0.9999522
> 10344713 "10397054" "Ahcy" -0.1844897 0.9999522
> 10344723 "10384020" "Rrs1" -0.2322781 0.9999522
> 10344725 "10608710" "Adhfe1" 0.5993566 0.9999522
> 10344741 "10363762" "Hnrnpa3" -0.2660978 0.9999522
> 10344743 "10375058" "3110035E14Rik" 0.9178868 0.9999522
> 10344750 "10381603" "Sgk3" -0.2961638 0.9999522
> 10344772 "10442373" "6030422M02Rik" -0.1653454 0.9999522
> 10344789 "10421227" "Cspp1" -0.1480766 0.9999522
> 10344799 "10534966" "Cspp1" -0.2436361 0.9999522
> 10344801 "10398408" "Cspp1" -0.4040665 0.9999522
> 10344803 "10398418" "Cspp1" -0.2556627 0.9999522
> 10344805 "10572772" "Cspp1" -0.1864641 0.9999522
>
> I want to sort on the "Gene Symbol" column so that I can remove the duplicates and keep the one with the highest log fold change.
>
> I tried the following and received an error.
>> x[order(x[,2]),]
> Error in order(x[, 2]) : unimplemented type 'list' in 'orderVector1'
I am not sure the sessionInfo() you give below corresponds to the
session above. I get:
> x <- data.frame(ID = 12345:12354, Gene =
Rkeys(mogene10sttranscriptclusterSYMBOL)[5001:5010], logFC = rnorm(10),
pval = runif(10))
> x
ID Gene logFC pval
1 12345 Sepw1 0.56914952 0.4916910
2 12346 Serf1 0.83929962 0.4816986
3 12347 Gm4748 0.12462117 0.9372249
4 12348 Sez6 -0.21468480 0.4921201
5 12349 Foxp3 -1.36283694 0.4575675
6 12350 Sfpi1 1.03632565 0.5251826
7 12351 Sfrp1 0.04689108 0.3068112
8 12352 Frzb 0.08379607 0.1509499
9 12353 Sfrp4 -1.61513620 0.9336235
10 12354 Srsf2 1.56222316 0.2571122
> x[order(x[,2]),]
ID Gene logFC pval
5 12349 Foxp3 -1.36283694 0.4575675
8 12352 Frzb 0.08379607 0.1509499
3 12347 Gm4748 0.12462117 0.9372249
1 12345 Sepw1 0.56914952 0.4916910
2 12346 Serf1 0.83929962 0.4816986
4 12348 Sez6 -0.21468480 0.4921201
6 12350 Sfpi1 1.03632565 0.5251826
7 12351 Sfrp1 0.04689108 0.3068112
9 12353 Sfrp4 -1.61513620 0.9336235
10 12354 Srsf2 1.56222316 0.2571122
It appears you have something loaded that thinks you want to use the
orderVector1() function. You can always specify the function you are
intending with the :: operator (in this case, you want base::order()).
> x[base::order(x[,2]),]
ID Gene logFC pval
5 12349 Foxp3 -1.36283694 0.4575675
8 12352 Frzb 0.08379607 0.1509499
3 12347 Gm4748 0.12462117 0.9372249
1 12345 Sepw1 0.56914952 0.4916910
2 12346 Serf1 0.83929962 0.4816986
4 12348 Sez6 -0.21468480 0.4921201
6 12350 Sfpi1 1.03632565 0.5251826
7 12351 Sfrp1 0.04689108 0.3068112
9 12353 Sfrp4 -1.61513620 0.9336235
10 12354 Srsf2 1.56222316 0.2571122
Best,
Jim
>
> If anyone has any suggestions for an easy way to sort a significant gene list, remove duplicated values, and keep the value with highest fold change, that would be helpful!
>
> I've posted my session info below.
>
> Thanks!
>
> Guest
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.1
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list