[Bioc-sig-seq] readAligned error: negative length vectors are not allowed
Martin Morgan
mtmorgan at fhcrc.org
Mon Mar 9 18:29:55 CET 2009
Hi Joseph --
(your emails contain both a To: and a Cc: bioc-sig-seq, this means
that two copies of each post make it to the list).
joseph <jdsandjd at yahoo.com> writes:
> Hi
> I have 4 data files generated from 2 lanes of a Solexa paired-end run.
> The qa function worked fine, but the readAligned gave me an error:
>>qaSummary = qa("/../data", type="SolexaExport")
>> qaSummary
> class: SolexaExportQA(9)
> QA elements (access with qa[["elt"]]):
> readCounts: data.frame(4 3)
> baseCalls: data.frame(4 5)
> readQualityScore: data.frame(6144 4)
> baseQuality: data.frame(376 3)
> alignQuality: data.frame(279 3)
> frequentSequences: data.frame(600 4)
> sequenceDistribution: data.frame(4928 4)
> perCycle: list(2)
> baseCall: data.frame(800 4)
> quality: data.frame(3830 5)
> perTile: list(2)
> readCounts: data.frame(1200 4)
> medianReadQualityScore: data.frame(1200 4)
>
>> qaSummary[["readCounts"]]
> read filtered aligned
> s_2_1_export.txt 12915438 7665822 7080343
> s_2_2_export.txt 12915438 7665822 7277877
> s_3_1_export.txt 17296866 7452595 7892453
> s_3_2_export.txt 17296866 7452595 8086091
>>
>
> When I tried to read all 4 files, I got the error below:
>>aln = readAligned("/../data", type="SolexaExport")
> Error: Input/Output
> 'readAligned' failed to parse files
> dirPath: '/../data'
> pattern: ''
> type: 'SolexaExport'
> error: negative length vectors are not allowed
"/../data" says "start at the root directory of the file system and go
one level up, then down to the 'data' directory". Is this what you
mean, or perhaps "./../data"?
readAligned as invoked above has no 'pattern' argument, and so will
match all files in the directory '/../data'. Likely what you want is
to add an argument pattern=".*_export.*" or similar; using
list.files("/../data") is a good way to see what you're trying to read
in, for instance list.files("/../data", ".*_export.*")
The error itself likely comes from trying to allocate a very large
object. In general at least for the initial stages of an analysis a
good work flow might read a 'large' file, perform some processing to
result in a 'small' object, read the next, etc., and finally combine
the small objects. This is what qa() does (visit an _export file,
summarize, visit the next, combine results into the object that you
refer to as qaSummary); a simple (and too naive) start to a ChIP-seq
analysis might generate a list of files containing aligned reads and
then summarize where in the genome the reads align to using
ShortRead::coverage (I did not test the following code),
files <- list.files(dirPath, ".*_export.*", full=TRUE)
cvg <- lapply(files, function(file) {
aln <- readAligned(dirname(file), basename(file), type="SolexaExport")
coverage(aln)
})
the results of coverage() are quite small and manageable, and the cvg
object contains data from all 'files'; using srapply rather than
lapply would allow this to be distrbuted across processors.
Martin
>
>> sessioInfo()
> R version 2.8.1 Patched (2009-03-03 r48046)
> i386-apple-darwin9.6.0
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] tools stats graphics grDevices utils datasets methods
> [8] base
> other attached packages:
> [1] ShortRead_1.0.6 lattice_0.17-20 Biobase_2.2.2 Biostrings_2.10.16
> [5] IRanges_1.0.13
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list