[BioC] Ringo - finding enriched regions

Wed Oct 21 12:33:55 CEST 2009

Dear Joern,

your guess was right. It was an issue with my probeAnno object. I 
created the probeAnno this way:
(reads is an AlignedRead object with all my probes)

pos = data.frame(CHROMOSOME=chromosome(reads),
                 PROBE_ID=as.character(id(reads)),
                 POSITION=position(reads),
                 LENGTH=width(reads))

probeAnno = posToProbeAnno(pos, genome="Mus_musculus.NCBIM37.55",
                           microarrayPlatform="mm.prompr.v02")

Adding the parameter stringsAsFactors=FALSE to the data.frame() function 
solved my problem. Without that parameter the "X.index" in my probeAnno 
were factors.

Thanks,
Hans-Ulrich

Joern Toedling wrote:
> Dear Hans-Ulrich,
>
> in that case, I am afraid I cannot immediately tell you what the source of the
> problem is. You are right, the smoothed probe intensities of these probes
> should all be greater than y0. And in my analyses, I have never observed
> something else. 
> How do the ChIP-enriched region look like when you plot them?
> (for example via
> plot(chers[[1]], eSetS, probeAnno)
> ). If these plots indicate correct results than at least the positions of your
> enriched regions seem to be correct and the problem is with assigning the
> probe identifiers to the enriched regions.
> There might be an issue with your probeAnno object and the way you generate it. 
> What is the result of 
> probeAnno["1.index"][probeAnno["1.start"]>=10001787 & 
> probeAnno["1.start"]<=0002329]
> ? These probe identifiers should include the ones in the first enriched region.
> I would suggest to use different probe names than "as.character" of the row
> numbers. Due to R's implicit conversion between vector formats, such names
> could lead to all sorts of hard-to-debug problems.
> If you provide me with a short excerpt of your data and the example script, I
> could have a deeper look into it to see where the problem might be.
>
> Best regards,
> Joern
>
> On Tue, 20 Oct 2009 21:10:20 +0200, Hans-Ulrich Klein wrote
>   
>> Dear Joern,
>>
>> the feature names of my ExpressionSet instance are:
>>
>>  > all(featureNames(eSetS) == as.character(1:nrow(eSetS)))
>> [1] TRUE
>>
>> So in my case both expressions
>>  > exprs(eSetS)[as.numeric(chers[[1]]@probes),]
>> and
>>  > exprs(eSetS)[chers[[1]]@probes,]
>> return the same probes that have log ratios smaller than y0 as 
>> described below.
>>
>> Best wishes,
>> Hans-Ulrich
>>
>> Joern Toedling wrote:
>>     
>>> Hello,
>>>
>>> I suspect that there is some issue with converting vectors between different
>>> formats and the identifiers of your probes (the 'featureNames' of the
>>> ExpressionSet) here.
>>> The actual way to obtain those intensities with version 1.8.0 should be
>>>
>>> exprs(eSetS)[as.numeric(chers[[1]]@probes),]
>>>
>>> Please let me know if this does not give the expected results.
>>>
>>> However, I admit that providing indices as a character vector for the probes
>>> slot was not necessary and rather misleading. Thus I have made slight changes
>>> to the function and provided an additional method 'probes' which allows you to
>>> obtain a character vector of probe names from each ChIP-enriched region
>>> without having to access any slots directly.
>>>
>>> These changes can be found in the current development version 1.9.15, which
>>> you can obtain from the Bioconductor repository tomorrow, and will also be in
>>> the new release version (Ringo 1.10.0) at the end of this month.
>>>
>>> With the new version, the following is the preferred way for obtaining the
>>>       
> values:
>   
>>> exprs(eSetS)[probes(chers[[1]]),]
>>>
>>> Hope this helps.
>>>
>>> Best regards,
>>> Joern
>>>
>>> On Mon, 19 Oct 2009 12:05:03 +0200, Hans-Ulrich Klein wrote
>>>   
>>>       
>>>> Hello,
>>>>
>>>> I am confused about the results returned from the 
>>>> "findChersOnSmoothed" function in the Ringo package. I have an 
>>>> ExpressionSet object storing normalized log ratios (ChIP / Control)
>>>>  from three replicates. I use this analysis workflow:
>>>>
>>>>  > eSetS = computeRunningMedians(eSet, probeAnno, modColumn="type",
>>>>                                 winHalfSize=400, min.probes=5,
>>>>                                 combineReplicates=TRUE)
>>>> [...]
>>>>  > y0 = upperBoundNull(exprs(eSetS), prob=0.99)
>>>>  > chers = findChersOnSmoothed(eSetS, probeAnno, thresholds=y0,
>>>>                               distCutOff=600, minProbesInRow=3)
>>>>
>>>> Surprisingly, the first enriched region does not contain any probe 
>>>> intensity above the threshold y0. This applies to many regions 
>>>> called enriched.
>>>>
>>>>  > chers[[1]]
>>>> BCR_ABL.chr1.cher1
>>>> Chr 1 : 10001787 - 10002329
>>>> Antibody : BCR_ABL
>>>> Maximum level = 1.665789
>>>> Score = 9.486747
>>>> Spans 15 probes.
>>>>  > y0
>>>> [1] 0.7279903
>>>>  > dim(eSetS)
>>>> Features  Samples
>>>>  4212009        1
>>>>  > exprs(eSetS[chers[[1]]@probes,])
>>>>          BCR_ABL
>>>> 112645 0.2140274
>>>> 112646 0.2469170
>>>> 112647 0.2485301
>>>> 112648 0.2501433
>>>> 112649 0.2765225
>>>> 112650 0.2813286
>>>> 112651 0.2803291
>>>> 112652 0.2727159
>>>> 112653 0.2469170
>>>> 112654 0.2469170
>>>> 112655 0.1166212
>>>> 112656 0.2355814
>>>> 112657 0.2355814
>>>> 112658 0.1608379
>>>> 112659 0.2063285
>>>>
>>>> Did I check the correct probes? Should not be the intensities > 0.727?
>>>>
>>>> My Ringo version is 1.8.0.
>>>>
>>>> Thanks in advance,
>>>> Hans-Ulrich
>>>>         
>
>
>   

-- 
Hans-Ulrich Klein
Department of Medical Informatics and Biomathematics
University of Münster
Domagkstrasse 9
48149 Münster, Germany
Tel.: +49 (0)251 83-58405