[Bioc-sig-seq] readAligned for Illumina's export file

Martin Morgan mtmorgan at fhcrc.org
Fri Nov 5 17:49:02 CET 2010


On 11/05/2010 09:27 AM, Martin Morgan wrote:
> On 11/05/2010 08:51 AM, Kunbin Qu wrote:
>> Dear all,
>>
>> can readAligned() or other function read in the reads mapped across
>> the junctions in the "export" file (eg, s_1_export.txt)  from
>> Illumina's pipeline? The following is the example of a regular
>> mapping entry and a read mapped across two exons. I had a test file
>> named s1Test, and when I used the following command, it can only read
>> in the first read. Thanks.
> 
> It's tricky to know what your file looks like, but this should be parsed
> by readAligned.
> 
>> x = readAligned("/tmp/kunbin_export.txt", type="SolexaExport")
>> x
> class: AlignedRead
> length: 2 reads; width: 51 cycles
> chromosome: chrX.fa
> splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824
> position: 108773654 20
> strand: + -
> alignQuality: NumericQuality
> alignData varLabels: run lane ... filtering contig
>> sread(x)
>   A DNAStringSet instance of length 2
>     width seq
> [1]    51 NTTTTAAAAACAGAATTTCTGCTCTATAATAACACAGCTAAAGGGAAATAA
> [2]    51 NGAACTTTAAGAGTGGTGTGGATGCAGACTCTTCTTATTTTAAAATCTTTA
>> quality(x)
> class: SFastqQuality
> quality:
>   A BStringSet instance of length 2
>     width seq
> [1]    51 BKOJHRQPPO_QQ_____b_b___b_bb_bb__bb__b_b___bbb_b__Q
> [2]    51 BKIKKUUTTU_____[[[[[[[[[[_b_____b______QQQ__b___b__
> 
> maybe your 'cfilt' filters out 'chromosomes' (which should probably have
> been something else, rseq?)
> 
>> chromosome(x)
> [1] chrX.fa
> [2] splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824
> 2 Levels: chrX.fa ...
> 
> More hints on what 'it can only read the first read' means might help.

I meant also to say that the AlignedRead class is expecting ungapped
alignments, which I do not think is an issue with the reads you present
but might be worth keeping in mind; the 'GappedAlignements' class in
GenomicRanges represents a work in progress for some gapped alignment
use cases. Martin

> 
> Martin
> 
> 
>>
>> -Kunbin
>>
>> SEQUENCER01     10      1       1       5110    943     0       1
>> NTTTTAAAAACAGAATTTCTGCTCTATAATAACACAGCTAAAGGGAAATAA
>> BKOJHRQPPO_QQ_____b_b___b_bb_bb__bb__b_b___bbb_b__Q     chrX.fa
>> 108773654    F       T50     199
>> Y
>>
>> SEQUENCER01     10      1       1       2815    941     0       1
>> NGAACTTTAAGAGTGGTGTGGATGCAGACTCTTCTTATTTTAAAATCTTTA
>> BKIKKUUTTU_____[[[[[[[[[[_b_____b______QQQ__b___b__
>> splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824   20
>> R       A50     200                                 Y
>>
>>
>>
>>
>>> s1t<-readAligned("./", pattern="s1Test", type="SolexaExport",
>>> filter=cfil) sessionInfo()
>> R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu
>>
>> locale: [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C [3]
>> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C
>> LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C [9]
>> LC_ADDRESS=C               LC_TELEPHONE=C [11]
>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages: [1] stats     graphics  grDevices utils
>> datasets  methods   base
>>
>> other attached packages: [1] ShortRead_1.6.2     Rsamtools_1.0.1
>> lattice_0.19-11 [4] Biostrings_2.16.7   GenomicRanges_1.0.1
>> IRanges_1.6.8
>>
>> loaded via a namespace (and not attached): [1] Biobase_2.8.0
>> grid_2.11.0   hwriter_1.2   tools_2.11.0
>>>
>>
>>
>>
>> ______________________________________________________________________
>>
>>
> The contents of this electronic message, including any attachments, are
> intended only for the use of the individual or entity to which they are
> addressed and may contain confidential information. If you are not the
> intended recipient, you are hereby notified that any use, dissemination,
> distribution, or copying of this message or any attachment is strictly
> prohibited. If you have received this transmission in error, please send
> an e-mail to postmaster at genomichealth.com and delete this message, along
> with any attachments, from your computer.
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________ Bioc-sig-sequencing
>> mailing list Bioc-sig-sequencing at r-project.org 
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list