[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays
cstrato
cstrato at aon.at
Wed Jun 13 20:33:34 CEST 2012
Dear Andreas,
Please note that I talk only about package xps, which does contain it's
own annotation, based on the Affymetrix annotation files, in this case
on files "MoEx-1_0-st-v1.na32.mm9.probeset.csv" and
"MoEx-1_0-st-v1.na32.mm9.transcript.csv", respectively. Thus with xps
you can do rma() on the trancript level and get the transcript annotation.
Package xps creates first a "scheme" file (see e.g. script
"script4schemes.R") which contains the Affymetrix annotation files for
probesets and transcripts, including the MoEx 1.0 ST identifiers.
Best regards
Christian
On 6/13/12 7:47 PM, Andreas Heider wrote:
> Yes, you are right!
> rma(target=()) can be used to collapse to transcript or probeset level.
> However, the problem is still there, as I a left with a nice
> ExpressionSet obejct that has values mapped to transcripts (if I decide
> so) but they are only annotated by something like 4701234. That is a
> probeset/transcript name for example. Now that wouldn'T be a problem
> given that normally such an identifier could be easily translated via
> Bioconductors annotation packages.
>
> But here comes the most significant part: There is no annotation package
> available that includes MoEx 1.0 ST identifiers!
>
> I am trying to get my package to work on these Exon arrays. And the
> package expects a proper annotation package such as eg. "mouse4302" to
> be attached to the annotation slot of the ExpressionSet.
>
> I'm still puzzled.
>
> 2012/6/13 cstrato <cstrato at aon.at <mailto:cstrato at aon.at>>
>
> Dear Andreas,
>
> As Jim already mentioned, package xps is able to preprocess MoExon
> 1.0 ST arrays at the probeset and the gene level, see also my
> earlier reply to a similar question:
> https://www.stat.math.ethz.ch/__pipermail/bioconductor/2012-__June/045958.html
> <https://www.stat.math.ethz.ch/pipermail/bioconductor/2012-June/045958.html>
>
> Best regards
> Christian
> _._._._._._._._._._._._._._._.___._._
> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
> V.i.e.n.n.a A.u.s.t.r.i.a
> e.m.a.i.l: cstrato at aon.at <http://aon.at>
> _._._._._._._._._._._._._._._.___._._
>
>
>
>
> On 6/13/12 4:47 PM, James W. MacDonald wrote:
>
> Hi Andreas,
>
> On 6/13/2012 3:14 AM, Andreas Heider wrote:
>
> Dear mailing list,
> I know this was on the list couple of times, and I think I
> read it
> all, but
> actually I still don't get it right. So here is my problem:
>
> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT
> Mouse
> Gene 1.0
> ST) in a similar fashion to eg. HG-U133 arrays.
> That means, I want to finally have it accessible as an
> ExpressionSet
> object
> with a right Bioconductor annotation assigned. This should
> include GENE
> SYMBOLS, RefSeq IDs and ENTREZ IDs.
>
>
> The problem here is that you want to do something that AFAIK
> isn't easy
> to do. The Gene ST arrays allow you to summarize all the probes that
> interrogate a particular transcript (e.g., all the exon-level
> probesets
> are collapsed to transcript level, and then you summarize).
> However, for
> the Exon ST arrays that isn't the case, unless there is
> something in xps
> to allow for that - I know next to nothing about that package, so
> Cristian Stratowa will have to chime in if I am missing something.
>
> For the Exon chips, you are always summarizing at the same probeset
> level, where there are <= 4 probes per probeset, and there can
> be any
> number of probesets that interrogate a given exon. Lots of these
> probesets interrogate regions that aren't even transcribed,
> according to
> current knowledge of the genome. When you choose core, extended
> or full
> probesets, you are just changing the number of probesets being
> used, not
> summarizing at a different level as with the Gene ST chip.
>
> So when you say you want gene symbols, refseq ids and gene ids, what
> exactly are you after? If a given probeset is in the intron of a
> gene do
> you want to annotate it as being part of that gene? How about if
> it is
> in the UTR (or really close to the UTR)? What do you want to do
> with the
> probesets where one or more of the probes binds in multiple
> positions in
> the genome? These are all questions that the exonmap package
> tries to
> consider, and it gets really complicated. That's why Affy went
> with the
> Gene ST chips - they unleashed the Exon chips on us and couldn't
> sell
> them because people were saying WTF do I do with this thing?
>
> I don't think there is an easy or obvious answer to your
> question. If
> you were to come up with what you think are reasonable answers to my
> questions, then it wouldn't be much work to extract the chr,
> start, end
> from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures
> (e.g.,
> findOverlaps()) to decide what regions are being interrogated, and
> annotate from there.
>
> Best,
>
> Jim
>
>
>
> I can import it as a AffyBatch and generate an ExpressionSet
> with the
> help
> of the Xmap/exonmap supplied CDF, but there is no annotation
> attached
> to it.
>
> OR
>
> I can import the CEL files with the "oligo" package as a
> Exon Array
> object
> and generate an ExpressionSet from it.
> However in that case it still have no annotation.
>
> Surprisingly on the Bioconductor website there are all
> packages needed to
> deal with Mouse Gene 1.0 ST arrays but the informtion to
> work with Mouse
> Exon 1.0 ST arrays seems missing!
>
> What am I doing wrong here? Has someone else had such problems?
>
> Thanks in advance for your effort,
> Andreas
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
More information about the Bioconductor
mailing list