[BioC] TEQC package very slow
nathalie
nac at sanger.ac.uk
Wed Jun 13 16:53:43 CEST 2012
HI,
This is the error message produced at the
myreadpair<-reads2pairs(myread) stage after it running for 7 hours:
> readpairs4_2_PigS<-reads2pairs(reads4_2_PigS)
[1] "there were 1453928 reads found without matching second read, or
whose second read matches to a different chromosome"
Error in endoapply(reads, mergefun) :
'FUN' did not produce an endomorphism
> Terminated
that may help,
thanks,
On 13/06/12 12:07, nathalie wrote:
> HI,
> I am analysing coverage data using TEQC package from bioC for quality
> assessment of target enrichment experiment .
> I am using a computer cluster farm to do the analysis and asked for
> large memory to be allocated, my bam files are 11 Gb in size and it
> seems that the analysis is taking very long, several hours, and then
> my session exit. Do I need to ask for this to be put on a long queue,
> more than 12 hours job? Do people use TEQC with large files? How can I
> be more efficient with this analysis?
> these are my commands:
> #get reads
> myread<-get.reads("reads.bam",filetype="bam")
> #get pair reads : at that point this will fail :in the doc it is
> stated " To run the function can be quite time consuming, depending on
> the number of reads"
> myreadpair<-reads2pairs(myread)
>
> #drop single reads
> myread<-myread[!(myread$ID %in% myreadpair$singleReads$ID), , drop=TRUE]
>
>
> I have used efficiently these functions on smaller files with miSeq
> data, but not yet with HiSeq ...
> Many thanks for sharing your experience in getting QC for large files
> efficiently
> Nathalie
>
> > sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=C
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] TEQC_2.4.0 hwriter_1.3 Rsamtools_1.8.4
> [4] Biostrings_2.24.1 GenomicRanges_1.8.3 IRanges_1.14.2
> [7] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.16.0 bitops_1.0-4.1 stats4_2.15.0 zlibbioc_1.2.0
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioconductor
mailing list