[BioC] TEQC package very slow
nathalie
nac at sanger.ac.uk
Wed Jun 13 13:07:49 CEST 2012
HI,
I am analysing coverage data using TEQC package from bioC for quality
assessment of target enrichment experiment .
I am using a computer cluster farm to do the analysis and asked for
large memory to be allocated, my bam files are 11 Gb in size and it
seems that the analysis is taking very long, several hours, and then my
session exit. Do I need to ask for this to be put on a long queue, more
than 12 hours job? Do people use TEQC with large files? How can I be
more efficient with this analysis?
these are my commands:
#get reads
myread<-get.reads("reads.bam",filetype="bam")
#get pair reads : at that point this will fail :in the doc it is stated
" To run the function can be quite time consuming, depending on
the number of reads"
myreadpair<-reads2pairs(myread)
#drop single reads
myread<-myread[!(myread$ID %in% myreadpair$singleReads$ID), , drop=TRUE]
I have used efficiently these functions on smaller files with miSeq
data, but not yet with HiSeq ...
Many thanks for sharing your experience in getting QC for large files
efficiently
Nathalie
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TEQC_2.4.0 hwriter_1.3 Rsamtools_1.8.4
[4] Biostrings_2.24.1 GenomicRanges_1.8.3 IRanges_1.14.2
[7] BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] Biobase_2.16.0 bitops_1.0-4.1 stats4_2.15.0 zlibbioc_1.2.0
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioconductor
mailing list