[BioC] Assigning gene symbols to Affymetrix data and averaging probes
James W. MacDonald
jmacdon at uw.edu
Wed Oct 3 17:30:34 CEST 2012
Hi Lesley,
On 10/3/2012 10:55 AM, Hoyles, Lesley wrote:
> Hi
>
> I have processed my affy data and am able to annotate the object
> mice.loess using the following. ID <- featureNames(mice.loess) Symbol
> <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess) <-
> data.frame(ID=ID,Symbol=Symbol)
>
>
> However, when I convert my object as follows - expr.loess <-
> exprs(mice.loess) - I lose the annotation and have been unable to
> find a way to annotate expr.loess. Please could anybody suggest how I
> can annotate expr.loess?
expr.loess <- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess))
>
>
> Is there a way of averaging probes for each gene with Affymetrix
> data? I've been able to do this with single-channel Agilent data
> using the example given in the limma guide.
There are probably two reasonable ways to do this. First, the easiest.
dat <- ReadAffy(cdfname = "mouse4302mmentrezcdf")
and proceed from there. This will use the MBNI re-mapped CDF package
based on Entrez Gene IDs, and you will have a single value per gene
after summarization. There are other ways to map the probes; see
http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
at the bottom of the page for more info.
Alternatively if you want to stick with the original probesets, the
problem arises that some probesets are not well annotated, so what to do
with those? In addition, gene symbols are not guaranteed to be unique,
so you can't just assume that they are. Entrez Gene and UniGene IDs are
supposed to be unique, so you could go with them, doing something like
(untested)
gns <- toTable(mouse4302ENTREZID)
alldat <- merge(gns, expr.loess, by = 1) ## where expr.loess is the
data.frame I suggest above
alldatlst <- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,])
combined.data <- do.call("rbind", lapply(alldatlst, function(x)
c(x[1,1:3], colMeans(x[,-c(1:3)])))
Here I am assuming that after the merge() step the first three columns
are the probeset ID, gene_id, symbol, and the remaining columns are the
expression values. You will lose all data for which there isn't an
Entrez Gene ID, but the same is true of the MBNI method I outline above.
Best,
Jim
>
>
> Thanks in advance for your help.
>
> Best wishes Lesley _______________________________________________
> Bioconductor mailing list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list