[BioC] Ringo - finding enriched regions
Hans-Ulrich Klein
h.klein at uni-muenster.de
Wed Oct 21 12:33:55 CEST 2009
Dear Joern,
your guess was right. It was an issue with my probeAnno object. I
created the probeAnno this way:
(reads is an AlignedRead object with all my probes)
pos = data.frame(CHROMOSOME=chromosome(reads),
PROBE_ID=as.character(id(reads)),
POSITION=position(reads),
LENGTH=width(reads))
probeAnno = posToProbeAnno(pos, genome="Mus_musculus.NCBIM37.55",
microarrayPlatform="mm.prompr.v02")
Adding the parameter stringsAsFactors=FALSE to the data.frame() function
solved my problem. Without that parameter the "X.index" in my probeAnno
were factors.
Thanks,
Hans-Ulrich
Joern Toedling wrote:
> Dear Hans-Ulrich,
>
> in that case, I am afraid I cannot immediately tell you what the source of the
> problem is. You are right, the smoothed probe intensities of these probes
> should all be greater than y0. And in my analyses, I have never observed
> something else.
> How do the ChIP-enriched region look like when you plot them?
> (for example via
> plot(chers[[1]], eSetS, probeAnno)
> ). If these plots indicate correct results than at least the positions of your
> enriched regions seem to be correct and the problem is with assigning the
> probe identifiers to the enriched regions.
> There might be an issue with your probeAnno object and the way you generate it.
> What is the result of
> probeAnno["1.index"][probeAnno["1.start"]>=10001787 &
> probeAnno["1.start"]<=0002329]
> ? These probe identifiers should include the ones in the first enriched region.
> I would suggest to use different probe names than "as.character" of the row
> numbers. Due to R's implicit conversion between vector formats, such names
> could lead to all sorts of hard-to-debug problems.
> If you provide me with a short excerpt of your data and the example script, I
> could have a deeper look into it to see where the problem might be.
>
> Best regards,
> Joern
>
> On Tue, 20 Oct 2009 21:10:20 +0200, Hans-Ulrich Klein wrote
>
>> Dear Joern,
>>
>> the feature names of my ExpressionSet instance are:
>>
>> > all(featureNames(eSetS) == as.character(1:nrow(eSetS)))
>> [1] TRUE
>>
>> So in my case both expressions
>> > exprs(eSetS)[as.numeric(chers[[1]]@probes),]
>> and
>> > exprs(eSetS)[chers[[1]]@probes,]
>> return the same probes that have log ratios smaller than y0 as
>> described below.
>>
>> Best wishes,
>> Hans-Ulrich
>>
>> Joern Toedling wrote:
>>
>>> Hello,
>>>
>>> I suspect that there is some issue with converting vectors between different
>>> formats and the identifiers of your probes (the 'featureNames' of the
>>> ExpressionSet) here.
>>> The actual way to obtain those intensities with version 1.8.0 should be
>>>
>>> exprs(eSetS)[as.numeric(chers[[1]]@probes),]
>>>
>>> Please let me know if this does not give the expected results.
>>>
>>> However, I admit that providing indices as a character vector for the probes
>>> slot was not necessary and rather misleading. Thus I have made slight changes
>>> to the function and provided an additional method 'probes' which allows you to
>>> obtain a character vector of probe names from each ChIP-enriched region
>>> without having to access any slots directly.
>>>
>>> These changes can be found in the current development version 1.9.15, which
>>> you can obtain from the Bioconductor repository tomorrow, and will also be in
>>> the new release version (Ringo 1.10.0) at the end of this month.
>>>
>>> With the new version, the following is the preferred way for obtaining the
>>>
> values:
>
>>> exprs(eSetS)[probes(chers[[1]]),]
>>>
>>> Hope this helps.
>>>
>>> Best regards,
>>> Joern
>>>
>>> On Mon, 19 Oct 2009 12:05:03 +0200, Hans-Ulrich Klein wrote
>>>
>>>
>>>> Hello,
>>>>
>>>> I am confused about the results returned from the
>>>> "findChersOnSmoothed" function in the Ringo package. I have an
>>>> ExpressionSet object storing normalized log ratios (ChIP / Control)
>>>> from three replicates. I use this analysis workflow:
>>>>
>>>> > eSetS = computeRunningMedians(eSet, probeAnno, modColumn="type",
>>>> winHalfSize=400, min.probes=5,
>>>> combineReplicates=TRUE)
>>>> [...]
>>>> > y0 = upperBoundNull(exprs(eSetS), prob=0.99)
>>>> > chers = findChersOnSmoothed(eSetS, probeAnno, thresholds=y0,
>>>> distCutOff=600, minProbesInRow=3)
>>>>
>>>> Surprisingly, the first enriched region does not contain any probe
>>>> intensity above the threshold y0. This applies to many regions
>>>> called enriched.
>>>>
>>>> > chers[[1]]
>>>> BCR_ABL.chr1.cher1
>>>> Chr 1 : 10001787 - 10002329
>>>> Antibody : BCR_ABL
>>>> Maximum level = 1.665789
>>>> Score = 9.486747
>>>> Spans 15 probes.
>>>> > y0
>>>> [1] 0.7279903
>>>> > dim(eSetS)
>>>> Features Samples
>>>> 4212009 1
>>>> > exprs(eSetS[chers[[1]]@probes,])
>>>> BCR_ABL
>>>> 112645 0.2140274
>>>> 112646 0.2469170
>>>> 112647 0.2485301
>>>> 112648 0.2501433
>>>> 112649 0.2765225
>>>> 112650 0.2813286
>>>> 112651 0.2803291
>>>> 112652 0.2727159
>>>> 112653 0.2469170
>>>> 112654 0.2469170
>>>> 112655 0.1166212
>>>> 112656 0.2355814
>>>> 112657 0.2355814
>>>> 112658 0.1608379
>>>> 112659 0.2063285
>>>>
>>>> Did I check the correct probes? Should not be the intensities > 0.727?
>>>>
>>>> My Ringo version is 1.8.0.
>>>>
>>>> Thanks in advance,
>>>> Hans-Ulrich
>>>>
>
>
>
--
Hans-Ulrich Klein
Department of Medical Informatics and Biomathematics
University of Münster
Domagkstrasse 9
48149 Münster, Germany
Tel.: +49 (0)251 83-58405
More information about the Bioconductor
mailing list