[Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Tue Apr 20 22:54:17 CEST 2010
Turns out I had the exact same R-devel on my system. And it seems I
have the exact same version of the various packages you have and the
same search order. On this version, the package vignette works.
So I guess this leaves one of the following possibilities (1) your
flyFiles is somehow wrong or (2) withShortRead does not work on your
system or (3) bug in Genominator 1.1.6.
If I understand you correctly, the exact same code worked yesterday
with 1.1.5? I am baffled. I have a hard time seeing how a bug could
get introduced that would not also lead to withShortRead failing.
So could you send me (off-list)
(1) a printing of flyFiles
(2) Check that withShortRead works
(3) run importFromAlignedReads with verbose = TRUE
(4) debug(importToExpData) and step though it.
Kasper
R version 2.12.0 Under development (unstable) (2010-04-18 r51771)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=en_US.iso885915 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] yeastRNASeq_0.0.3 ShortRead_1.5.23 Rsamtools_0.2.8
[4] lattice_0.18-5 Biostrings_2.15.27 GenomicRanges_0.1.16
[7] Genominator_1.1.6 GenomeGraphs_1.7.2 biomaRt_2.3.5
[10] IRanges_1.5.79 RSQLite_0.8-4 DBI_0.2-5
loaded via a namespace (and not attached):
[1] Biobase_2.7.6 hwriter_1.2 RCurl_1.4-1 tools_2.12.0 XML_2.8-1
On Tue, Apr 20, 2010 at 1:23 PM, joseph franklin
<joseph.franklin at yale.edu> wrote:
> Kasper,
> Thanks--importing from a vector of filenames was working perfectly. However, when I upgraded to Genominator 1.1.6 today, and ran the same import command (I think), I get the error below.
>
> I may be doing something wrong without realizing it.
> Thanks again,
> Joe
>
> (flyFiles is a vector of filenames)
>
>> flydata<-importFromAlignedReads(x=flyFiles, type="Bowtie" , chrMap=chrMap, dbFilename="~/g/annotation/genominator/flydata.db", tablename="raw")
> Error in importToExpData(data.frame(chr = chr, location = loc, strand = str), :
> After removing missing locations, df has no rows.
> Timing stopped at: 0.94 0.06 0.999
>
>> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-04-18 r51771)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] ShortRead_1.5.23 Rsamtools_0.2.8 lattice_0.18-5
> [4] Biostrings_2.15.27 GenomicRanges_0.1.16 Genominator_1.1.6
> [7] GenomeGraphs_1.7.2 biomaRt_2.3.5 IRanges_1.5.79
> [10] RSQLite_0.8-4 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.7.6 hwriter_1.2 RCurl_1.4-1 XML_2.8-1
>
>
>
> On 19 Apr 2010, at 11:26, Kasper Daniel Hansen wrote:
>
>> Hi Joe
>>
>> This is addressed in the development version. We now have the
>> capability of giving importFromAlignedReads a (named) vector of
>> filenames instead of a named list of AlignedRead objects. This vector
>> of filenames will be read in one at a time, so you just need enough
>> memory to process a single lane. I have processed around 160 lanes
>> worth of data using this approach.
>>
>> There is an extended example in the 'with ShortRead' vignette.
>>
>> importFromAlignedReads also has the capability of directly summing
>> several columns (fi you need this). So let us say you have 6 files
>> (lanes) and you want to end up with a database with 2 columns
>> (assuming you have a 3x2 experiment and you have decided to add up
>> over the lanes). Then you can do this using a construction where the
>> names of the files are like
>> "a", "a", "a", "b", "b", "b"
>> (this will create two columns named "a" and "b" each holding 3 lanes
>> worth of data).
>>
>> In this case, all 3 lanes will be read into memory at the same time -
>> it is less memory efficient but it was much easier to code. If that
>> is impossible you should create a standard 6 column database and then
>> use collapseExpData. The importFromAlignedReads is more of a
>> convenience (and speed) trick.
>>
>> I uploaded a new version 1.1.6 yesterday which I recommend, because of
>> some documentation updates. This version should replace 1.1.5 on the
>> Bioconductor development servers sometime tomorrow.
>>
>> Kasper
>>
>>
>> On Mon, Apr 19, 2010 at 11:06 AM, joseph franklin
>> <joseph.franklin at yale.edu> wrote:
>>> I'm addressing this to Jim Bullard, who has been really helpful answering some of my questions, as well as the list, in case anyone has some advice for me.
>>>
>>> I've started using Genominator (I'm using the release version right now) to quantitate and analyze RNA-seq data, and have been really successful aggregating AlignedRead objects with my own annotation tables to produce per-gene counts. I've done this with sets of 2-3 AlignedRead objects (each representing an Illumina lane), but I'd like to extend the approach to a few dozen lanes. Since this is far too much data to fit in memory, I need an efficient way to combine many AlignedRead objects at once that doesn't rely on them being loaded as objects at the same time.
>>>
>>> I imagine that I need to load the objects into tables using the importFromAlignedReads, and then join the appropriate columns, either before or after aggregation (the manual hints that afterwards is preferable). However, there are a few points I'm confused with (probably resulting from my limited experience with SQLite):
>>>
>>> - I've been unable load to load a SQLite database file that was previously created with the importFromAlignedReads--what is the best way to load the database connection--for instance, during a new R session?
>>>
>>> -Can AlignedRead objects only be imported (via importFromAlignedReads) as named lists of two or more objects? What about single AlignedRead objects? I would imagine that a solution to my problem would be to create a separate table in a database file for each of my AlignedRead objects (I made a loop to do this), and then join these tables (as long as I can create a connection to the database).
>>>
>>> I think my problems could be solved if I could load the AlignedRead objects from multiple lanes into tables in database file, load it, and join the appropriate columns from the various tables (and then aggregate with the annotations in a single step--this would seem to be the most straightforward). Any advice on accomplishing these steps would be much appreciated.
>>>
>>> Thanks again,
>>> Joe Franklin
>>>
>>> ________________________________
>>> Joseph Franklin
>>> Department of Cell Biology
>>> Yale University
>>> 295 Congress Ave, BCMM 137
>>> New Haven, CT 06519
>>> USA
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>
>
More information about the Bioc-sig-sequencing
mailing list