[Bioc-sig-seq] readBamGappedAlignments() cigar error
Hervé Pagès
hpages at fhcrc.org
Fri Jun 25 18:34:06 CEST 2010
Hi Craig,
Sorry for taking so long to answer this. Yes some tools seem to generate
SAM/BAM output with zero-length operations in the CIGAR field
and the official SAM Format Spec 0.1.2-draft apparently have no
problem with that:
http://samtools.sourceforge.net/SAM1.pdf
( see regular expression for the CIGAR string: ([0-9]+[MIDNSHP])+|\* )
I've modified the CIGAR handling code in GenomicRanges for
allowing zero-length operations in CIGAR strings.
This will be available via biocLite() in the next 24 hours in
GenomicRanges 1.0.5 (release) and GenomicRanges 1.1.15 (devel).
Cheers,
H.
On 05/27/2010 03:29 PM, Craig Johnson wrote:
> I have two BAM files generated from ABI's Bioscope aligner that I want to import using readBamGappedAlignments(). One of the files imports without issue but the second gives me this error:
>
> Error in cigarToIRangesListByAlignment(x at cigar, x at start) :
> in 'cigar' element 3315124: invalid CIGAR operation length at char 5
>
> The cigar at line 3315124 of the SAM file is 9M0N32M9H
>
> Can anyone suggest what causes this error? I have reads earlier in the SAM/BAM file that have 'ON' in them so even though I don't know what 'ON' means that doesn't seem to be the issue.
>
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] org.Hs.eg.db_2.4.1 RSQLite_0.9-0 DBI_0.2-5
> [4] AnnotationDbi_1.10.1 Biobase_2.8.0 rtracklayer_1.8.1
> [7] RCurl_1.4-2 bitops_1.0-4.1 GenomicFeatures_1.0.0
> [10] Rsamtools_1.0.1 Biostrings_2.16.0 GenomicRanges_1.0.1
> [13] IRanges_1.6.2
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.4.0 BSgenome_1.16.1 tools_2.11.0 XML_3.1-0
>
> Thank you,
> Craig
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list