[Bioc-sig-seq] Overall directions (Martin Morgan)
Simon Lin
simonlin at duke.edu
Wed Mar 5 18:55:32 CET 2008
I agree that SQL should be in the design mix. Here are a few additional
thoughts:
1) Let R do what it is good at -- statistics!
2) Reuse established sequence analysis methods: BLAST, assembly,
PHRED/PHRAP/CONSED etc
3) Define clear object structures in R, so wrappers can be used and differnt
algorithms can be tried using the same interface
4) Use SQL as a conduit between R and the sequence analysis results, because
of large size of the results (not only the raw data)
Simon Lin
Northwestern
=============================
Maybe a closing thought on this is that the data describing the
experiment might belong in SQL tables (but also fit easily into R's
memory), but it's less clear that the sequences belong in a relational
data base. So some other format is likely appropriate for the big
data. Here we've basically been using the disk-based storage
structures implied by output of the Solexa (or other) software
pipeline. Obviously a sub-optimal solution, and it would be great to
hear solutions that other developers have explored.
Martin
More information about the Bioc-sig-sequencing
mailing list