[Bioc-sig-seq] readAligned error: negative length vectors are not allowed

Mon Mar 9 18:29:55 CET 2009

Hi Joseph --

(your emails contain both a To: and a Cc: bioc-sig-seq, this means
that two copies of each post make it to the list).

joseph <jdsandjd at yahoo.com> writes:

> Hi
> I have 4 data files generated from 2 lanes of a Solexa paired-end run. 
> The qa function worked fine, but the readAligned gave me an error:
>>qaSummary = qa("/../data", type="SolexaExport")
>> qaSummary
> class: SolexaExportQA(9)
> QA elements (access with qa[["elt"]]):
>   readCounts: data.frame(4 3)
>   baseCalls: data.frame(4 5)
>   readQualityScore: data.frame(6144 4)
>   baseQuality: data.frame(376 3)
>   alignQuality: data.frame(279 3)
>   frequentSequences: data.frame(600 4)
>   sequenceDistribution: data.frame(4928 4)
>   perCycle: list(2)
>     baseCall: data.frame(800 4)
>     quality: data.frame(3830 5)
>   perTile: list(2)
>     readCounts: data.frame(1200 4)
>     medianReadQualityScore: data.frame(1200 4)
>
>> qaSummary[["readCounts"]]
>                      read filtered aligned
> s_2_1_export.txt 12915438  7665822 7080343
> s_2_2_export.txt 12915438  7665822 7277877
> s_3_1_export.txt 17296866  7452595 7892453
> s_3_2_export.txt 17296866  7452595 8086091
>> 
>
> When I tried to read all 4 files, I got the error below:
>>aln = readAligned("/../data", type="SolexaExport")
> Error: Input/Output
>   'readAligned' failed to parse files
>   dirPath: '/../data'
>   pattern: ''
>   type: 'SolexaExport'
>   error: negative length vectors are not allowed

"/../data" says "start at the root directory of the file system and go
one level up, then down to the 'data' directory". Is this what you
mean, or perhaps "./../data"?

readAligned as invoked above has no 'pattern' argument, and so will
match all files in the directory '/../data'. Likely what you want is
to add an argument pattern=".*_export.*" or similar; using
list.files("/../data") is a good way to see what you're trying to read
in, for instance list.files("/../data", ".*_export.*")

The error itself likely comes from trying to allocate a very large
object. In general at least for the initial stages of an analysis a
good work flow might read a 'large' file, perform some processing to
result in a 'small' object, read the next, etc., and finally combine
the small objects. This is what qa() does (visit an _export file,
summarize, visit the next, combine results into the object that you
refer to as qaSummary); a simple (and too naive) start to a ChIP-seq
analysis might generate a list of files containing aligned reads and
then summarize where in the genome the reads align to using
ShortRead::coverage (I did not test the following code),

files <- list.files(dirPath, ".*_export.*", full=TRUE)
cvg <- lapply(files, function(file) {
  aln <- readAligned(dirname(file), basename(file), type="SolexaExport")
  coverage(aln)
})

the results of coverage() are quite small and manageable, and the cvg
object contains data from all 'files'; using srapply rather than
lapply would allow this to be distrbuted across processors.

Martin

>
>> sessioInfo()
> R version 2.8.1 Patched (2009-03-03 r48046) 
> i386-apple-darwin9.6.0 
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods  
> [8] base     
> other attached packages:
> [1] ShortRead_1.0.6    lattice_0.17-20    Biobase_2.2.2      Biostrings_2.10.16
> [5] IRanges_1.0.13    
>
>
>       
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793