[Bioc-sig-seq] making target from fasta file
Herve Pages
hpages at fhcrc.org
Wed Jun 4 22:18:54 CEST 2008
Hi Joseph,
Joseph Dhahbi, P.h.D. wrote:
>
> Hi Herve
> Thank you very much for your help. Using the built-in masks as you
> suggested was easy.
> Do I need to do it for each chromosome separately? Is there a way to
> apply it to the whole genome and create MaskedDNAString of the whole
> genome?
No way to create a MaskedDNAString object of the whole genome. Note that
this would be a very big object and that most machines would not have
enough memory for this. Of course, with a medium-size genome like the Fly,
the problem is not as severe as with the Human genome but still...
How about using the trick I've sent you in a previous email (see the email
for the details):
> allrepeats <- read.XStringViews("dm3rm", format="fasta", subjectClass="DNAString", collapse="-")
> c <- countPDict(pdict, subject(allrepeats))
Also, in my previous email, I was trying to reproduce the problem you had
with read.DNAStringSet() but couldn't and was asking your sessionInfo().
Did read.DNAStringSet() finally work for you?
> Once I create a whole genome MaskedDNAString, I would like to use the
> runAnalysis1 script in the GenomeSearching.pdf to analyze my input
> dictionary.
Look at the runAnalysis2 script. I guess it's closer to what you are
trying to do (you have a dictionary of patterns, not a single pattern).
You'll need to make some modifications though e.g. use of countPDict
instead of matchPDict and store the results for each chromosome in a
list that you return to the caller at the end of the script. No need
to write the results to a file like in the vignette.
Cheers,
H.
More information about the Bioc-sig-sequencing
mailing list