[Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects

joseph franklin joseph.franklin at yale.edu
Tue Apr 20 19:23:14 CEST 2010


Kasper,
Thanks--importing from a vector of filenames was working perfectly.  However, when I upgraded to Genominator 1.1.6 today, and ran the same import command (I think), I get the error below.

I may be doing something wrong without realizing it.  
Thanks again,
Joe

(flyFiles is a vector of filenames)

> flydata<-importFromAlignedReads(x=flyFiles, type="Bowtie" , chrMap=chrMap, dbFilename="~/g/annotation/genominator/flydata.db", tablename="raw")
Error in importToExpData(data.frame(chr = chr, location = loc, strand = str),  : 
  After removing missing locations, df has no rows.
Timing stopped at: 0.94 0.06 0.999 

> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-04-18 r51771) 
x86_64-unknown-linux-gnu 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ShortRead_1.5.23     Rsamtools_0.2.8      lattice_0.18-5      
 [4] Biostrings_2.15.27   GenomicRanges_0.1.16 Genominator_1.1.6   
 [7] GenomeGraphs_1.7.2   biomaRt_2.3.5        IRanges_1.5.79      
[10] RSQLite_0.8-4        DBI_0.2-5           

loaded via a namespace (and not attached):
[1] Biobase_2.7.6 hwriter_1.2   RCurl_1.4-1   XML_2.8-1 



On 19 Apr 2010, at 11:26, Kasper Daniel Hansen wrote:

> Hi Joe
> 
> This is addressed in the development version.  We now have the
> capability of giving importFromAlignedReads a (named) vector of
> filenames instead of a named list of AlignedRead objects.  This vector
> of filenames will be read in one at a time, so you just need enough
> memory to process a single lane.  I have processed around 160 lanes
> worth of data using this approach.
> 
> There is an extended example in the 'with ShortRead' vignette.
> 
> importFromAlignedReads also has the capability of directly summing
> several columns (fi you need this).  So let us say you have 6 files
> (lanes) and you want to end up with a database with 2 columns
> (assuming you have a 3x2 experiment and you have decided to add up
> over the lanes).  Then you can do this using a construction where the
> names of the files are like
>  "a", "a", "a", "b", "b", "b"
> (this will create two columns named "a" and "b" each holding 3 lanes
> worth of data).
> 
> In this case, all 3 lanes will be read into memory at the same time -
> it is less memory efficient but it was much easier to code.  If that
> is impossible you should create a standard 6 column database and then
> use collapseExpData.  The importFromAlignedReads is more of a
> convenience (and speed) trick.
> 
> I uploaded a new version 1.1.6 yesterday which I recommend, because of
> some documentation updates.  This version should replace 1.1.5 on the
> Bioconductor development servers sometime tomorrow.
> 
> Kasper
> 
> 
> On Mon, Apr 19, 2010 at 11:06 AM, joseph franklin
> <joseph.franklin at yale.edu> wrote:
>> I'm addressing this to Jim Bullard, who has been really helpful answering some of my questions, as well as the list, in case anyone has some advice for me.
>> 
>> I've started using Genominator (I'm using the release version right now) to quantitate and analyze RNA-seq data, and have been really successful aggregating AlignedRead objects with my own annotation tables to produce per-gene counts.  I've done this with sets of 2-3 AlignedRead objects (each representing an Illumina lane), but I'd like to extend the approach to a few dozen lanes.  Since this is far too much data to fit in memory, I need an efficient way to combine many AlignedRead objects at once that doesn't rely on them being loaded as objects at the same time.
>> 
>> I imagine that I need to load the objects into tables using the importFromAlignedReads, and then join the appropriate columns, either before or after aggregation (the manual hints that afterwards is preferable).  However, there are a few points I'm confused with (probably resulting from my limited experience with SQLite):
>> 
>> - I've been unable load to load a SQLite database file that was previously created with the importFromAlignedReads--what is the best way to load the database connection--for instance, during a new R session?
>> 
>> -Can AlignedRead objects only be imported (via importFromAlignedReads) as named lists of two or more objects?  What about single AlignedRead objects?  I would imagine that a solution to my problem would be to create a separate table in a database file for each of my AlignedRead objects (I made a loop to do this), and then join these tables (as long as I can create a connection to the database).
>> 
>> I think my problems could be solved if I could load the AlignedRead objects from multiple lanes into tables in database file, load it, and join the appropriate columns from the various tables (and then aggregate with the annotations in a single step--this would seem to be the most straightforward).  Any advice on accomplishing these steps would be much appreciated.
>> 
>> Thanks again,
>> Joe Franklin
>> 
>> ________________________________
>> Joseph Franklin
>> Department of Cell Biology
>> Yale University
>> 295 Congress Ave, BCMM 137
>> New Haven, CT 06519
>> USA
>> 
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> 



More information about the Bioc-sig-sequencing mailing list