[BioC] clustering in R
James W. MacDonald
jmacdon at uw.edu
Tue Oct 23 16:36:03 CEST 2012
Hi Priya,
On 10/23/2012 3:34 AM, priya [guest] wrote:
> I have a RMA normalized genes expression datset with 22810 rows and 9 columns( types of promoters) and a subset of the data is as follows:
>
> ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192
> 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647
> 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605
> 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403
> 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909
> 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246
> 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836
>
>
>
>
>
> -- output of sessionInfo():
>
> I want to do a clustering of the above and tried the hierarchical clustering:
>
> d<- dist(as.matrix(deg), method = "euclidean")
> where deg is the a matrix of the differentially expressed genes ( 4300 in number ).And I get the following warning:
>
> Warning message:
> In dist(as.matrix(deg), method = "euclidean") : NAs introduced by coercion
>
> Is it allright to proceed with the clustering inspite of the warning ?
Well, you shouldn't get that warning if your matrix is all numeric. And
if your matrix isn't all numeric, it will usually all be coerced to
character, so I would want to check that out and see what is happening.
>
>
> hc<- hclust(d)
> plot(hc, hang = -0.01, cex = 0.7)
>
> I get a dendrogram which is very dense and the labels are not clear: Also I do not know which of the 9 promoters are classified in the tree for the several genes: How would it be possible to label the tree with the promoters and also how to visualize the genes into a clearer dendrogram? There are around 4300 genes and would like to get a better dendrogram so that I could visualize it better.
That is a lot of genes, so you will have to make the dendrogram really
big if you actually want to see things. The best thing to do IMO is to
put it in a pdf of the correct size, and then you can zoom in and look
at different regions. It would probably be easiest to make the pdf
really wide, so something like
pdf("dendrogram.pdf", width = 200, height = 8)
plot(hc, hang = -0.01, cex = 0.7)
dev.off()
As for the promoters being classified by the tree, I am not sure what
you are asking. If it is simply a labeling issue, note that your 'hc'
object is a list with a 'labels' member that contains whatever is going
to be used in labeling the dendrogram. If you want to change what the
labels are, then you can modify that.
Best,
Jim
>
>
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list