[Bioc-sig-seq] ShortRead readAligned Strange Sequences

Dario Strbenac D.Strbenac at garvan.org.au
Thu Apr 29 10:11:23 CEST 2010


Hello,

I was looking at the first few sequences in the readAligned object

e.g.

> head(sread(aligned))
  A DNAStringSet instance of length 6
    width seq
[1]    36 AACCCTAACCCTAACCCTAACCTTAACCTAACCTTA
[2]    36 TCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAGG
[3]    36 GCCTCTCTGCGCCTGCGCCGGCGGCGTTTCGTTCTC
[4]    36 GCGCGGCGCGCCTCTCGGCGCCTGCGCCGGCGGAGG
[5]    36 GAGGAAAAAGGCAGGACAGAATTACGAGGTGCTGGC
[6]    36 GAAAAAGGCAGGACAGAATTACGAGATGCTGGCNCA

and I looked at their strands

> head(strand(aligned))
[1] + + - - + +

When I did a search in the .map file relating to this alignment, I was able to find the first 2 sequences (which are on the + strand), but not the 3rd, nor its complement. Same for the 4th which is also - strand. To get a complement I used Biostrings::complementSeq.

Could this be a bug in the way that the readAligned object is created ?

I also noticed that the mismatch column for negative stranded reads is exactly the same as the in .map file (when I found them by chr and position - 1, rather than sequence).

Should this be = (coordinate - 35) for negative reads since Bowtie reports all mismatches from the 5' end of the read and ShortRead coordinates are in terms of sequencing cycles ?

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia



More information about the Bioc-sig-sequencing mailing list