[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays
James W. MacDonald
jmacdon at uw.edu
Wed Jun 13 16:47:03 CEST 2012
Hi Andreas,
On 6/13/2012 3:14 AM, Andreas Heider wrote:
> Dear mailing list,
> I know this was on the list couple of times, and I think I read it all, but
> actually I still don't get it right. So here is my problem:
>
> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene 1.0
> ST) in a similar fashion to eg. HG-U133 arrays.
> That means, I want to finally have it accessible as an ExpressionSet object
> with a right Bioconductor annotation assigned. This should include GENE
> SYMBOLS, RefSeq IDs and ENTREZ IDs.
The problem here is that you want to do something that AFAIK isn't easy
to do. The Gene ST arrays allow you to summarize all the probes that
interrogate a particular transcript (e.g., all the exon-level probesets
are collapsed to transcript level, and then you summarize). However, for
the Exon ST arrays that isn't the case, unless there is something in xps
to allow for that - I know next to nothing about that package, so
Cristian Stratowa will have to chime in if I am missing something.
For the Exon chips, you are always summarizing at the same probeset
level, where there are <= 4 probes per probeset, and there can be any
number of probesets that interrogate a given exon. Lots of these
probesets interrogate regions that aren't even transcribed, according to
current knowledge of the genome. When you choose core, extended or full
probesets, you are just changing the number of probesets being used, not
summarizing at a different level as with the Gene ST chip.
So when you say you want gene symbols, refseq ids and gene ids, what
exactly are you after? If a given probeset is in the intron of a gene do
you want to annotate it as being part of that gene? How about if it is
in the UTR (or really close to the UTR)? What do you want to do with the
probesets where one or more of the probes binds in multiple positions in
the genome? These are all questions that the exonmap package tries to
consider, and it gets really complicated. That's why Affy went with the
Gene ST chips - they unleashed the Exon chips on us and couldn't sell
them because people were saying WTF do I do with this thing?
I don't think there is an easy or obvious answer to your question. If
you were to come up with what you think are reasonable answers to my
questions, then it wouldn't be much work to extract the chr, start, end
from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g.,
findOverlaps()) to decide what regions are being interrogated, and
annotate from there.
Best,
Jim
>
> I can import it as a AffyBatch and generate an ExpressionSet with the help
> of the Xmap/exonmap supplied CDF, but there is no annotation attached to it.
>
> OR
>
> I can import the CEL files with the "oligo" package as a Exon Array object
> and generate an ExpressionSet from it.
> However in that case it still have no annotation.
>
> Surprisingly on the Bioconductor website there are all packages needed to
> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse
> Exon 1.0 ST arrays seems missing!
>
> What am I doing wrong here? Has someone else had such problems?
>
> Thanks in advance for your effort,
> Andreas
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list