[Bioc-sig-seq] Bioc short read directions

Herve Pages hpages at fhcrc.org
Wed Apr 2 21:57:22 CEST 2008


Martin Morgan wrote:
> Herve Pages wrote:
>> Hi Harris,
>>> Nobody seems to have mentioned it, but what about a "both strand" 
>>> mode?  If RC is reverse-complement,
>>> this feature would basically automate the first statement here:
>>>
>>>     Dict = PDict(c(patterns, RC(patterns)))
>>>         matchPDict(Dict,Seq)
>>>
>>> The user would just pass 'patterns' and have to say whether he wants 
>>> forward and reverse matches
>>> distinguished.  The result would be of length 2*length(patterns) as 
>>> it is now, if they should be,
>>> but of length length(patterns) if they can be combined.
>>
>> Instead of putting this at the PDict() level, I would rather build some
>> higher level function _on top_ of PDict() that would handle this.
> 
> Also, for Solexa-style data and approximate matching, you wouldn't want 
> to reverseComplement the reads, because the start and end of the reads 
> are not equally trustworthy. reverseComplementing the subject is one 
> approach (though quite expensive if the subject is human chromosome 1, 
> for instance).

Another approach could be to flip the location of the trusted band:

    dict1 = PDict(reads, tb.start=1, tb.end=12)        ## Trusted Prefix
    dict2 = PDict(RC(reads), tb.start=-12, tb.end=-1)  ## Trusted Suffix

but matchPDict() is not ready to handle PDict objects with a trusted suffix yet...

This would be quite expensive too since it would require to reverse-complement
the entire dictionary (might be more expensive than just reverse-complementing
human chr1) and to build a second PDict object.

H.



More information about the Bioc-sig-sequencing mailing list