[Bioc-sig-seq] coverage vectors and subseq
kirti prakash
kirtiprakash3.14 at gmail.com
Wed Aug 18 16:05:50 CEST 2010
Hi Martin,
Thanks again. I really like your way of explaining... even new comers
like me are able to learn how data compression works. Its really a
nice approach to break the problem in parts in order to understand it.
I have some human histone modification data and I am trying to plot
the tag densities.
This is the complete code(from the above link) that I am trying to implement...
aln.H3K4me1 <- readAligned("dirpath", type="Bowtie")
cov.H3K4me1 = coverage(aln.H3K4me1)
deltaConvolve <- function( coverage, winCentres, revStrand, left, right)
{
result <- rep( 0, right - left + 1 )
for( i in seq(along=winCentres) ) {
v <- as.vector(seqselect( coverage, winCentres[i] - left,
winCentres[i] + right ))
if( revStrand[i] )
v <- rev( v )
result <- result + v
}
result
}
library("biomaRt")
mart <- useMart( "ensembl", dataset = "hsapiens_gene_ensembl" )
martReply <- getBM( attributes=c( "transcript_start",
"transcript_end", "strand" ), filters="chromosome_name", values="10",
mart=mart )
tss <- ifelse( martReply$strand == 1, martReply$transcript_start,
martReply$transcript_end )
win = c(-50, 50)
H3K4me3 <- deltaConvolve( cov.H3K4me1$chr10, tss, martReply$strand ==
-1, left=win[1], right=win[2] )
The code runs for a while but after couple of minutes it gives the
above errors. so possibly some but not all values are out of bound.
As clear from above I get TSS from ensembl. There are some 6000 TSS
for chr10 and window size I have reduced to 101 so that it doesn't
gets out of bound but still I get this error.
Do I need to have some try-catch block?
Thank you,
Best regards,
Kirti Prakash
On Wed, Aug 18, 2010 at 4:01 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 08/17/2010 09:41 PM, kirti prakash wrote:
>> Hi Martin,
>>
>> Thanks a lot for your help. Bioconductor is a great place as one gets
>> the help from the best and its completely free. Its seems all one
>> needs to do good research in bioinformatics is a computer, some data
>> and Bioconductor.
>>
>> Its the same tutorial... the link you mentioned above . I cannot
>> attach the tutorials due to the limit of mail size.
>
> I meant that if the document is available in a public place, then
> provide a URL to it so that those trying to help can better understand
> what your problem is.
>
>>
>> I did exactly as you told and most the things work fine except...
>>
>> Later in the tutorial they use deltaConvolve function...
>>
>> H3K4me1 <- deltaConvolve( cov.H3K4me1.chr10, tss, martReply$strand ==
>> -1, left=win[1], right=win[2] )
>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>> range index out of bounds
>> Error in as.vector(seqselect(coverage, winCentres[i] + left, winCentres[i] + :
>> error in evaluating the argument 'x' in selecting a method for
>> function 'as.vector'
>>
>> Also you mention endoapply above... it also gave me same error...
>>
>> endoapply(cov.H3K4me1, seqselect, 123456-50, 123456+50)
>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>> range index out of bounds
>
> One approach to solving problems is to try to simplify them. So let's
> try to reproduce the error but with a simpler example. You're selecting
> a subset of an Rle object
>
>> seqselect(Rle(1:10, 10:1), 10, 20)
> 'integer' Rle of length 11 with 3 runs
> Lengths: 1 9 1
> Values : 1 2 3
>
> The error says something about 'index out of bounds', which suggests
>
>> seqselect(Rle(1:10, 10:1), 10, 2000)
> Error in .bracket.Index(i, lx, asRanges = TRUE) :
> range index out of bounds
>> seqselect(Rle(1:10, 10:1), -10, 20)
> Error in .bracket.Index(i, lx, asRanges = TRUE) :
> range index out of bounds
>
> Any hints to what your original problem is, and how to solve it?
>
> Martin
>
>>
>>
>> I found that both of these are exception handling errors but I again I
>> don't know to fix them. It would be great if you can again help on
>> this.
>
>>
>> Thanks again,
>>
>> Best Regards,
>>
>> Kirti Prakash
>>
>> On Wed, Aug 18, 2010 at 7:33 AM, kirti prakash
>> <kirtiprakash3.14 at gmail.com> wrote:
>>> Hi Martin,
>>>
>>> Thanks a lot for your help. Bioconductor is a great place as one gets
>>> the help from the best and its completely free. Its seems all one
>>> needs to do good research in bioinformatics is a computer, some data
>>> and Bioconductor.
>>>
>>> Its the same tutorial the link you mentioned above . In future I will
>>> attach the pdfs of the tutorials also.
>>>
>>> I did exactly as you told and most the things work fine except...
>>>
>>> Later in the tutorial they use deltaConvolve function...
>>>
>>> H3K4me1 <- deltaConvolve( cov.H3K4me1.chr10, tss, martReply$strand ==
>>> -1, left=win[1], right=win[2] )
>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>> range index out of bounds
>>> Error in as.vector(seqselect(coverage, winCentres[i] + left, winCentres[i] + :
>>> error in evaluating the argument 'x' in selecting a method for
>>> function 'as.vector'
>>>
>>> Also you mention endoapply above... it also gave me same error...
>>>
>>> endoapply(cov.H3K4me1, seqselect, 123456-50, 123456+50)
>>> Error in .bracket.Index(i, lx, asRanges = TRUE) :
>>> range index out of bounds
>>>
>>>
>>> I found that both of these are exception handling errors but I again I
>>> don't know to fix them. It would be great if you can again help on
>>> this.
>>>
>>> Thanks again,
>>>
>>> Best Regards,
>>>
>>> Kirti Prakash
>>>
>>>
>>>
>>>
>>> On Tue, Aug 17, 2010 at 7:47 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>>> kirti prakash <kirtiprakash3.14 at gmail.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was reading the material "EMBL course on Short Read analysis with
>>>>> Bioconductor: An exercise with coverage vectors" by Simon Anders.
>>>>
>>>> It took me a second to find
>>>>
>>>> http://bioconductor.org/help/course-materials/2009/EMBLJune09/Practicals/TSS/
>>>>
>>>> is that the tutorial you mean?
>>>>
>>>>>
>>>>> I tried this...
>>>>> aln <- readAligned("dirPath", pattern="sequence.map", type="Bowtie")
>>>>> cov = coverage(aln)
>>>>> cov
>>>>> [[1]]
>>>>> SimpleRleList of length 25
>>>>> $chr1
>>>>> 'integer' Rle of length 247188620 with 840106 runs
>>>>> Lengths: 463 36 6823 36 550058 ... 713 36 2034 36
>>>>> Values : 0 1 0 1 0 ... 0 1 0 2
>>>>>
>>>>> $chr10
>>>>> 'integer' Rle of length 135373320 with 446681 runs
>>>>> Lengths: 88078 36 3880 36 12451 ... 20 50 36 22054 36
>>>>> Values : 0 1 0 1 0 ... 1 0 1 0 1
>>>>> .
>>>>> .
>>>>> .
>>>>>
>>>>>> cvg <- cov$chr10
>>>>>> as.vector(subseq(cvg, 123456+50, 123456-50))
>>>>
>>>> I think this should be
>>>>
>>>> seqselect(cvg$chr10, 123456+50, 123456-50)
>>>>
>>>> or, to subset all chromosomes,
>>>>
>>>> endoapply(cvg, seqselect, 123456+50, 123456-50)
>>>>
>>>> (though it doesn't seem like you'd usually want to select consistent
>>>> coordinates across chromosomes).
>>>>
>>>> To get here, I looked up the help page for Rle-class
>>>>
>>>> > class?Rle
>>>>
>>>> This is with
>>>>
>>>>> sessionInfo()
>>>> R version 2.11.1 Patched (2010-06-03 r52215)
>>>> x86_64-unknown-linux-gnu
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] ShortRead_1.6.2 Rsamtools_1.0.5 lattice_0.18-8
>>>> [4] Biostrings_2.16.9 GenomicRanges_1.0.5 IRanges_1.6.11
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.8.0 grid_2.11.1 hwriter_1.2 tools_2.11.1
>>>>
>>>> Martin
>>>>
>>>>
>>>>> Error in function (classes, fdef, mtable) :
>>>>> unable to find an inherited method for function "subseq", for signature "list"
>>>>> Error in as.vector(subseq(cvg, 123456 + 50, 123456 - 50)) :
>>>>> error in evaluating the argument 'x' in selecting a method for
>>>>> function 'as.vector'
>>>>>
>>>>> I guess * 'integer' Rle of length* should be 'numeric' Rle of length
>>>>> as per the booklet ... but I don't know how to fix it.
>>>>>
>>>>> I know I am making some stupid mistake. It would be great if anyone
>>>>> can provide some help on this.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Kirti Prakash
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-sig-sequencing mailing list
>>>>> Bioc-sig-sequencing at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>
>>>> --
>>>> Martin Morgan
>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: Arnold Building M1 B861
>>>> Phone: (206) 667-2793
>>>>
>>>
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
More information about the Bioc-sig-sequencing
mailing list