[Bioc-sig-seq] getSeq and Btaurus$chrUn.scaffolds - ambiguous name error

Janet Young jayoung at fhcrc.org
Tue Nov 16 03:51:01 CET 2010


Hi again,

I'm interested in some sequences on the cow chrUn scaffolds, and am  
having a bit of bother getting them.  I think I might have uncovered a  
bug, although I might just be doing something wrong.  The code and  
output below should explain all.  Any suggestions?

thanks (again!),

Janet Young

-------------------------------------------------------------------

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org

http://www.fhcrc.org/labs/trask/

-------------------------------------------------------------------


library(BSgenome.Btaurus.UCSC.bosTau4)

######  this works, gives me a 50bp sequence
getSeq(Btaurus,"chr1",start=1,end=50)
[1] "TACCCCACTCACACTTATGGATAGATCAACTAAACAGAAAATTAACAAGG"

####### for some scaffolds in the chrUn.scaffolds pile, I don't get an  
error message, but getSeqs seems to ignore the start and end  
coordinates requested - e.g. the sequence returned here is the whole  
scaffold, not just the first 50bp
getSeq(Btaurus,"chrUn.004.11829",start=1,end=50)
[1]  
"TCATGTGTTTCTTCCAGTCCAGCATTTCTCATGATGTACTCTGCATATAAGTTAAATAAACAGGGTGACAA 
TATACAGCCTTGATGAACTCCTTTTCCTATTTGGAACCAGTCTGTTGTTCCATGTCCAGTTCTAACTGTTGCTTCCTGACCTGCATACAGATTTCTCAAGAGGCAGATCAGGTGTTCTCATCTCCTGAGAATTGAAGGTACAAATTGTAGTGTTTCAATTGGCACCATGCTAATTTATCTTGGCCTAAAATAGTGAATGGGCTTCCCTGGTGGCTCAGGTGGTAAAGAATCTGCCTGCAATGCTGGAGACCTGGGTTCAATATCTGGGTTGGGAAGATTACCTGGAGGAGGGCATGGAGGCTTACTCGAATATTCTTGCCTGGAAAATCTCCATGGACAGAGAAGCTGGGTGGGTTACTGTCCATGGGGTCGCAAAGAGTCAGACGTGACTGAGCAACTAAGCACAGCACAACACAAAATAGTGAATACTGAGCAAGTAAAGGAAAAACCTCTTCCTCTCAGAAATTGGTCTTCATTTTTTCATGAGAATTGCTAGTCTTCCTCCCAAAGCCAAAACCATAAATTTGTTAGTGTTTGACCTCAATATATTTTCTCTTAACTCAGCTTTTAAACCTTCTCTGCCTCCTGCTACCATTCACTTTCTAGTACATTTGAAATCTGTCCAAGCCATTCCTGGGGTTCAGGTGTCTGAGACCTGATTTATTTCATTGATATATTAAAACACCCTTGAATCCAGCCAACGTATGTGGCCAGTTTTACTTGCTTTGCTCCCATACTGGTAATGGAATTTTTATGGCTGTAAAATATCTGGGTCATGTGGCATTTTCATCTTCTGTTGTCTTGAGCTGGTATAGTTTTACCAACGTGCCATTAAGGGATGGTTCCTTTACCATCATTGTGCTTCCTGGGGCCTTGCCCACTTTGCACTGTAAGTCAGAACAAGAGACCCTCCAAGTATTTAATTTCC"

#### for other scaffolds I just get an error message, although the  
named scaffold definitely exists (is it doing a partial match on the  
name, not an exact match?)

getSeq(Btaurus,"chrUn.004.1022")
Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i],  
start[i],  :
   sequence chrUn.004.1022 found more than once, please use a non- 
ambiguous name

which ( names(Btaurus$chrUn.scaffolds) == "chrUn.004.1022" )
[1] 1022

grep (  "chrUn.004.1022" , names(Btaurus$chrUn.scaffolds) )
[1]  1022 10220 10221 10222 10223 10224 10225 10226 10227 10228 10229

sessionInfo()

R version 2.12.0 (2010-10-15)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BSgenome.Btaurus.UCSC.bosTau4_1.3.16 BSgenome_1.18.1
[3] Biostrings_2.18.0                    GenomicRanges_1.2.1
[5] IRanges_1.8.2

loaded via a namespace (and not attached):
[1] Biobase_2.10.0



More information about the Bioc-sig-sequencing mailing list