[Bioc-sig-seq] ShortRead readAligned Strange Sequences
Dario Strbenac
D.Strbenac at garvan.org.au
Thu Apr 29 10:11:23 CEST 2010
Hello,
I was looking at the first few sequences in the readAligned object
e.g.
> head(sread(aligned))
A DNAStringSet instance of length 6
width seq
[1] 36 AACCCTAACCCTAACCCTAACCTTAACCTAACCTTA
[2] 36 TCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAGG
[3] 36 GCCTCTCTGCGCCTGCGCCGGCGGCGTTTCGTTCTC
[4] 36 GCGCGGCGCGCCTCTCGGCGCCTGCGCCGGCGGAGG
[5] 36 GAGGAAAAAGGCAGGACAGAATTACGAGGTGCTGGC
[6] 36 GAAAAAGGCAGGACAGAATTACGAGATGCTGGCNCA
and I looked at their strands
> head(strand(aligned))
[1] + + - - + +
When I did a search in the .map file relating to this alignment, I was able to find the first 2 sequences (which are on the + strand), but not the 3rd, nor its complement. Same for the 4th which is also - strand. To get a complement I used Biostrings::complementSeq.
Could this be a bug in the way that the readAligned object is created ?
I also noticed that the mismatch column for negative stranded reads is exactly the same as the in .map file (when I found them by chr and position - 1, rather than sequence).
Should this be = (coordinate - 35) for negative reads since Bowtie reports all mismatches from the 5' end of the read and ShortRead coordinates are in terms of sequencing cycles ?
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia
More information about the Bioc-sig-sequencing
mailing list