[Bioc-sig-seq] Minor bug in method 'aligned' of class 'PairwiseAlignedFixedSubject'
Patrick Aboyoun
paboyoun at fhcrc.org
Mon May 25 23:36:57 CEST 2009
Joern,
The aligned method for PairwiseAlignedFixedSubject objects extracts an
XStringSet containing the alignments in the "coordinate system" of the
fixed subject; i.e. the first position represents the first position
of the fixed subject, the second position represents the second
position of the fixed subject, etc. This means aligned will contain
matched/mismatched characters and deletions, but it will not tell you
about insertions. I'll make the man page more clear to avoid confusion
in the future.
> suppressMessages(library(Biostrings))
> subject <- DNAString("AAAACCCCGGGGTTTT")
> pattern <- pattern <- DNAStringSet(c("left" = "ACGTACGTAAAA",
> "deletions" = "AAAAGGGGTTTT", "insertions" = "AAAAAACCCCCCGGGGTTTT",
> "right" = "TTTTACGT"))
> pa <- pairwiseAlignment(pattern, subject, type = "global-local")
> alignedPA <- aligned(pa)
> names(alignedPA) <- names(pattern)
> alignedPA
A DNAStringSet instance of length 4
width seq names
[1] 16 AAAA------------ left
[2] 16 AAAA----GGGGTTTT deletions
[3] 16 AAAACCCCGGGGTTTT insertions
[4] 16 ------------TTTT right
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-08 r48504)
i386-apple-darwin9.6.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.13.9 IRanges_1.3.16
loaded via a namespace (and not attached):
[1] Biobase_2.5.2
Quoting Joern Toedling <Joern.Toedling at curie.fr>:
> Hello,
>
> I believe there is a minor bug in the visualization provided by the "aligned"
> method of the class "PairwiseAlignedFixedSubject".
> I performed a pairwise alignment looking for overlaps between reads and a
> fixed subject DNAString. Here are 10 reads which all show some form overlap
> with the subject.
>
>> aligned(pa)
> A DNAStringSet instance of length 10
> width seq
> [1] 37 CTTTAGGCACCAT------------------------
> [2] 37 CTGTAGTCACCATC-----------------------
> [3] 37 CTGTAGGCACCATC-----------------------
> [4] 37 CTGTAGGCACCATC-----------------------
> [5] 37 CTGTAGGCACCATCAAT--------------------
> [6] 37 CTTTAGTCAACATC-----------------------
> [7] 37 CTGTAGG------------------------------
> [8] 37 CTGTAGGCACCATCAATCGT-----------------
> [9] 37 --------------------------------GCTTT
> [10] 37 CTGTATTCACCATCAA---------------------
>> start(pattern(pa))
> [1] 24 23 23 23 20 23 30 17 1 21
>> end(pattern(pa))
> [1] 36 36 36 36 36 36 36 36 5 36
>
> So in fact for 9 of the 10 reads the overlap is at the end of the read, and
> for one at the start. However, the visualization of the "aligned" method
> wrongly suggest that it is the other way round.
>
> Regards,
> Joern
>
>> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-05-18 r48561)
> i686-pc-linux-gnu
>
> attached base packages:
> [1] stats graphics grDevices datasets utils methods base
>
> other attached packages:
> [1] ShortRead_1.3.5 lattice_0.17-25 BSgenome_1.13.2 Biostrings_2.13.9
> [5] IRanges_1.3.16
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.2 grid_2.10.0 hwriter_1.1
>
>
> ---
> Joern Toedling
> Institut Curie -- UMR218
> 26 rue d'Ulm, 75005 Paris, FRANCE
> Tel. +33 (0)156246942
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list