[Bioc-sig-seq] Rsamtools

Thu Jan 28 22:50:34 CET 2010

Martin/Steve,

Thanks a lot guys, using the correct chromosome names fixed that problem. Instead of "chr1", I simply needed to use "1". Now I am able to import reads by looping through, but I am getting some strange results. In this whole-exon-capture dataset I would expect the largest chromosomes to have the highest number of reads but that does not appear to be what I am getting (highlighted below). With respect to file size the BAM appears to be within the range I would expect, but after importing into my AlignedRead object I should have roughly 100 times the number of reads I am getting now. Anyone see something I am doing incorrectly? 

Thanks again!

-karl

library(org.Hs.eg.db)

lens <- org.Hs.egCHRLENGTHS
lens <- lens[c(1:22,"X","Y")]

i=1
which <- RangesList(IRanges(start=1,end=lens[i]))
names(which) <- names(lens)[i]
params <- ScanBamParam(which=which)
finAR <- .readAligned_bam("../aln.sorted.nodupes.bam",param=params)

for(i in c(2:length(lens))){

	which <- RangesList(IRanges(start=1,end=lens[i]))
	names(which) <- names(lens)[i]
	params <- ScanBamParam(which=which)

	finAR <- append(finAR,.readAligned_bam("../aln.sorted.nodupes.bam",param=params))
}

> sort(table(chromosome(finAR)),dec=T)[1:5]

    12     11      8      X      2
619386 281709 260369 217034 173985

-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Wednesday, January 27, 2010 8:26 PM
To: Steve Lianoglou
Cc: Dykema, Karl; Bioc-sig-sequencing at r-project.org
Subject: Re: [Bioc-sig-seq] Rsamtools

On 01/27/2010 02:18 PM, Steve Lianoglou wrote:
> Hi Karl,
> 
> On Wed, Jan 27, 2010 at 5:00 PM, Dykema, Karl <Karl.Dykema at vai.org> wrote:
>> Hi all,
>>
>> We have been using Rsamtools to import BAM files also. On previous sequencing runs the BAM files were around 800 megs and I could import the entire BAM with .readAligned_bam(). Our new sequencing data creates larger BAM files and I am unable to import them like before. So, as Steve pointed out to me offline, I probably will need to import the reads from each chromosome individually. Unfortunately it does not seem to be recognizing my chromosome names, error msg below. Has anyone had a similar problem or recognize something that I might be doing incorrectly? Thanks in advance.
>>
>>
>>> which <- RangesList(chr1=IRanges(start=1,end=247249719))
>>> params <- ScanBamParam(which=which)
>>>
>>> chr1reads <- scanBam("../aln.sorted.nodupes.bam",param=params)
>> Error in function (bam, tmpl, space, start, end)  : failed to scan BAM
>>  file: �
>>  last record: 0
>> In addition: Warning message:
>> In function (bam, tmpl, space, start, end)  : 'space' not in BAM header
>>  file: ../aln.sorted.nodupes.bam
>>  space: chr1
> 
> Perhaps the chromosome names are different in your BAM file. I think
> you can list them using samtools, like so (from the command line):
> 
> $ samtools view -H aln.sorted.nodupes.bam

I think also the currently undocumented scanBamHeader

  bam = list.files(system.file("extdata", package="Rsamtools"),
                   "bam$",full=TRUE)
  scanBamHeader(bam)[[1]]$targets

which reports targets and numbers recorded in the header (the numbers
are NOT the number of reads in the file, but numbers inserted by
arbitrary upstream software, likewise the names reported in the header
might differ from the names actually present in the bam file).

seq1 seq2
1575 1584

Martin

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.