[Bioc-sig-seq] Overall directions (Martin Morgan)

Wed Mar 5 18:55:32 CET 2008

I agree that SQL should be in the design mix. Here are a few additional 
thoughts:
1) Let R do what it is good at -- statistics!
2) Reuse established sequence analysis methods: BLAST, assembly, 
PHRED/PHRAP/CONSED etc
3) Define clear object structures in R, so wrappers can be used and differnt 
algorithms can be tried using the same interface
4) Use SQL as a conduit between R and the sequence analysis results, because 
of large size of the results (not only the raw data)

Simon Lin
Northwestern

=============================
Maybe a closing thought on this is that the data describing the
experiment might belong in SQL tables (but also fit easily into R's
memory), but it's less clear that the sequences belong in a relational
data base. So some other format is likely appropriate for the big
data. Here we've basically been using the disk-based storage
structures implied by output of the Solexa (or other) software
pipeline. Obviously a sub-optimal solution, and it would be great to
hear solutions that other developers have explored.

Martin