[Bioc-sig-seq] PDict question
Robert Gentleman
rgentlem at fhcrc.org
Tue Jun 3 18:39:38 CEST 2008
Hi,
Whether you have enough RAM will be a function of lots of things -
which genome you are matching too (some are larger than others), how
careful you are about dropping sequences when they are not needed etc.
And as you have not provide those details it is not possible to give
more concrete advice.
You could try breaking the sequences down into smaller subsets, say
build a PDict on half of the data - match, then repeat on the other half
(or thirds or whatever size does work for your architecture). In
general, you may want to consider moving the analysis to a Linux box
with more RAM (we typically use machines with 32 or 64 GB of RAM - which
are surprisingly inexpensive these days).
best wishes
Robert
Stephen Henderson wrote:
> Hi Joseph
> You look like you should have enough RAM on your MacPro. Have you compiled a 64-bit version of R for the Mac? The CRAN binaries are 32-bit and will restrict the available memory.
>
> Stephen
>
>
> ________________________________
>
> From: bioc-sig-sequencing-bounces at r-project.org on behalf of Joseph Dhahbi, P.h.D.
> Sent: Tue 03/06/2008 16:21
> To: bioc-sig-sequencing at r-project.org
> Cc: bioc-sig-sequencing at r-project.org
> Subject: [Bioc-sig-seq] PDict question
>
>
>
> Hello
> I need help on how to get around the memory error reported
> below, especially when I can not add anymore RAM:
> Here is the Hardware Overview:
> Model Name: Mac Pro
> Model Identifier: MacPro1,1
> Processor Name: Dual-Core Intel Xeon
> Processor Speed: 2.66 GHz
> Number Of Processors: 2
> Total Number Of Cores: 4
> L2 Cache (per processor): 4 MB
> Memory: 20 GB
> Bus Speed: 1.33 GHz
> Boot ROM Version: MP11.005C.B08
> SMC Version: 1.7f10
> Serial Number: G87052SGUPZ
>
>
>
>> NM_seq=readSolexaFastA(NM_fa)
>> NM_alf=alphabetFrequency(NM_seq, baseOnly=TRUE)
>> NM_seq_clean = NM_seq[NM_alf[,"other"]==0]
>> length(NM_seq)
> [1] 4820218
>> length(NM_seq_clean)
> [1] 4817537
>> NM_seq_clean
> A DNAStringSet instance of length 4817537
> width seq
> [1] 36 GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAT
> [2] 36 GTGGTAATTCATCAGATCTCGGATGGCATTGGTCAT
> [3] 36 GGGAGGTCACTAATGGAGACACACAGAAATGTAACA
> [4] 36 GGGATTGGTTTTTTGTTACTGATTTGTTTGAGTTCA
> [5] 36 GTGGTAATTTTGACTTTTTAGGTTAATTTATTTTTT
> [6] 36 GATCGGAAGGAGCTCGTATGCCGTCTTCTGCTTAGA
> [7] 36 GGTCAGTTGTGTTCTCCTGAGTAGGTTGTGTGAATG
> [8] 36 GGGAGGTCACTAATGGAGACACACAGAAATGTAACA
> [9] 36 GGGAGGCTGAGGCAGGAGAATGGCATGAACCTAGAT
> ... ... ...
> [4817529] 36 TTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG
> [4817530] 36 CATCAATGTATCTTAAGGCGTAAATTGTAAGCGTTA
> [4817531] 36 CGAGCAGCGACGCATCACCCAGCTAGATCGGAAGAG
> [4817532] 36 GCAATGCCACTGGCGCGACAACCGGGACACCATAGG
> [4817533] 36 CCTCGCCGGACACGCTGAACTTGTGGCCGTTTTCGT
> [4817534] 36 CCATTGTACAACGTATCGACATATCCTCCACCCGCC
> [4817535] 36 CCCCCTGAACCTGAAACATAAAATGAATGCAATTGT
> [4817536] 36 ACCATGTTGTCCAAGGGCGAATTCTGCAGATATCCA
> [4817537] 36 CAGGGGCCGGCGGCTGGCTAGGGCTGCAGCGTTAAA
>
>> NM_seq_pDict=PDict(NM_seq_clean)
> Error in .PDict(dict, names(dict), tb.start, tb.end,
> drop.head, drop.tail, :
> alloc_actree_nodes_buf(): failed to alloc
> actree_nodes_buf
> R(433,0xa000d000) malloc: *** vm_allocate(size=4032987136)
> failed (error code=3)
> R(433,0xa000d000) malloc: *** error: can't allocate region
> R(433,0xa000d000) malloc: *** set a breakpoint in
> szone_error to debug
>
>> sessionInfo()
> R version 2.7.0 (2008-04-22)
> i386-apple-darwin8.10.1
>
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] tools stats graphics grDevices utils
> datasets methods base
>
> other attached packages:
> [1] BiostringsCinterfaceDemo_0.1.2 Biostrings_2.8.9
> Biobase_2.0.1
>
>
>
>
> Regards,
> Joseph
>
> Joseph M. Dhahbi, PhD
> Childrens Hospital Oakland Research Institute
> 5700 Martin Luther King Jr. Way
> Oakland, CA 94609
> USA
> Ph.(510)428-3885 EXT.5743
> Cell.(702)335-0795
> Fax (510)450-7910
> jdhahbi at chori.org
> The email message (and any attachments) is for the sole...{{dropped:21}}
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioc-sig-sequencing
mailing list