[Bioc-sig-seq] Minor bug in method 'aligned' of class 'PairwiseAlignedFixedSubject'

Mon May 25 23:36:57 CEST 2009

Joern,
The aligned method for PairwiseAlignedFixedSubject objects extracts an  
XStringSet containing the alignments in the "coordinate system" of the  
fixed subject; i.e. the first position represents the first position  
of the fixed subject, the second position represents the second  
position of the fixed subject, etc. This means aligned will contain  
matched/mismatched characters and deletions, but it will not tell you  
about insertions. I'll make the man page more clear to avoid confusion  
in the future.

> suppressMessages(library(Biostrings))
> subject <- DNAString("AAAACCCCGGGGTTTT")
> pattern <- pattern <- DNAStringSet(c("left" = "ACGTACGTAAAA",  
> "deletions" = "AAAAGGGGTTTT", "insertions" = "AAAAAACCCCCCGGGGTTTT",  
> "right" = "TTTTACGT"))
> pa <- pairwiseAlignment(pattern, subject, type = "global-local")
> alignedPA <- aligned(pa)
> names(alignedPA) <- names(pattern)
> alignedPA
   A DNAStringSet instance of length 4
     width seq                                               names
[1]    16 AAAA------------                                  left
[2]    16 AAAA----GGGGTTTT                                  deletions
[3]    16 AAAACCCCGGGGTTTT                                  insertions
[4]    16 ------------TTTT                                  right
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-08 r48504)
i386-apple-darwin9.6.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.13.9 IRanges_1.3.16

loaded via a namespace (and not attached):
[1] Biobase_2.5.2

Quoting Joern Toedling <Joern.Toedling at curie.fr>:

> Hello,
>
> I believe there is a minor bug in the visualization provided by the "aligned"
> method of the class "PairwiseAlignedFixedSubject".
> I performed a pairwise alignment looking for overlaps between reads and a
> fixed subject DNAString. Here are 10 reads which all show some form overlap
> with the subject.
>
>> aligned(pa)
>   A DNAStringSet instance of length 10
>      width seq
>  [1]    37 CTTTAGGCACCAT------------------------
>  [2]    37 CTGTAGTCACCATC-----------------------
>  [3]    37 CTGTAGGCACCATC-----------------------
>  [4]    37 CTGTAGGCACCATC-----------------------
>  [5]    37 CTGTAGGCACCATCAAT--------------------
>  [6]    37 CTTTAGTCAACATC-----------------------
>  [7]    37 CTGTAGG------------------------------
>  [8]    37 CTGTAGGCACCATCAATCGT-----------------
>  [9]    37 --------------------------------GCTTT
> [10]    37 CTGTATTCACCATCAA---------------------
>> start(pattern(pa))
>  [1] 24 23 23 23 20 23 30 17  1 21
>> end(pattern(pa))
>  [1] 36 36 36 36 36 36 36 36  5 36
>
> So in fact for 9 of the 10 reads the overlap is at the end of the read, and
> for one at the start. However, the visualization of the "aligned" method
> wrongly suggest that it is the other way round.
>
> Regards,
> Joern
>
>> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-05-18 r48561)
> i686-pc-linux-gnu
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] ShortRead_1.3.5   lattice_0.17-25   BSgenome_1.13.2   Biostrings_2.13.9
> [5] IRanges_1.3.16
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.2 grid_2.10.0   hwriter_1.1
>
>
> ---
> Joern Toedling
> Institut Curie -- UMR218
> 26 rue d'Ulm, 75005 Paris, FRANCE
> Tel. +33 (0)156246942
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>