[Bioc-sig-seq] Bioc short read directions
Herve Pages
hpages at fhcrc.org
Wed Apr 2 21:57:22 CEST 2008
Martin Morgan wrote:
> Herve Pages wrote:
>> Hi Harris,
>>> Nobody seems to have mentioned it, but what about a "both strand"
>>> mode? If RC is reverse-complement,
>>> this feature would basically automate the first statement here:
>>>
>>> Dict = PDict(c(patterns, RC(patterns)))
>>> matchPDict(Dict,Seq)
>>>
>>> The user would just pass 'patterns' and have to say whether he wants
>>> forward and reverse matches
>>> distinguished. The result would be of length 2*length(patterns) as
>>> it is now, if they should be,
>>> but of length length(patterns) if they can be combined.
>>
>> Instead of putting this at the PDict() level, I would rather build some
>> higher level function _on top_ of PDict() that would handle this.
>
> Also, for Solexa-style data and approximate matching, you wouldn't want
> to reverseComplement the reads, because the start and end of the reads
> are not equally trustworthy. reverseComplementing the subject is one
> approach (though quite expensive if the subject is human chromosome 1,
> for instance).
Another approach could be to flip the location of the trusted band:
dict1 = PDict(reads, tb.start=1, tb.end=12) ## Trusted Prefix
dict2 = PDict(RC(reads), tb.start=-12, tb.end=-1) ## Trusted Suffix
but matchPDict() is not ready to handle PDict objects with a trusted suffix yet...
This would be quite expensive too since it would require to reverse-complement
the entire dictionary (might be more expensive than just reverse-complementing
human chr1) and to build a second PDict object.
H.
More information about the Bioc-sig-sequencing
mailing list