[Bioc-sig-seq] readAligned for Illumina's export file
Martin Morgan
mtmorgan at fhcrc.org
Fri Nov 5 17:49:02 CET 2010
On 11/05/2010 09:27 AM, Martin Morgan wrote:
> On 11/05/2010 08:51 AM, Kunbin Qu wrote:
>> Dear all,
>>
>> can readAligned() or other function read in the reads mapped across
>> the junctions in the "export" file (eg, s_1_export.txt) from
>> Illumina's pipeline? The following is the example of a regular
>> mapping entry and a read mapped across two exons. I had a test file
>> named s1Test, and when I used the following command, it can only read
>> in the first read. Thanks.
>
> It's tricky to know what your file looks like, but this should be parsed
> by readAligned.
>
>> x = readAligned("/tmp/kunbin_export.txt", type="SolexaExport")
>> x
> class: AlignedRead
> length: 2 reads; width: 51 cycles
> chromosome: chrX.fa
> splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824
> position: 108773654 20
> strand: + -
> alignQuality: NumericQuality
> alignData varLabels: run lane ... filtering contig
>> sread(x)
> A DNAStringSet instance of length 2
> width seq
> [1] 51 NTTTTAAAAACAGAATTTCTGCTCTATAATAACACAGCTAAAGGGAAATAA
> [2] 51 NGAACTTTAAGAGTGGTGTGGATGCAGACTCTTCTTATTTTAAAATCTTTA
>> quality(x)
> class: SFastqQuality
> quality:
> A BStringSet instance of length 2
> width seq
> [1] 51 BKOJHRQPPO_QQ_____b_b___b_bb_bb__bb__b_b___bbb_b__Q
> [2] 51 BKIKKUUTTU_____[[[[[[[[[[_b_____b______QQQ__b___b__
>
> maybe your 'cfilt' filters out 'chromosomes' (which should probably have
> been something else, rseq?)
>
>> chromosome(x)
> [1] chrX.fa
> [2] splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824
> 2 Levels: chrX.fa ...
>
> More hints on what 'it can only read the first read' means might help.
I meant also to say that the AlignedRead class is expecting ungapped
alignments, which I do not think is an issue with the reads you present
but might be worth keeping in mind; the 'GappedAlignements' class in
GenomicRanges represents a work in progress for some gapped alignment
use cases. Martin
>
> Martin
>
>
>>
>> -Kunbin
>>
>> SEQUENCER01 10 1 1 5110 943 0 1
>> NTTTTAAAAACAGAATTTCTGCTCTATAATAACACAGCTAAAGGGAAATAA
>> BKOJHRQPPO_QQ_____b_b___b_bb_bb__bb__b_b___bbb_b__Q chrX.fa
>> 108773654 F T50 199
>> Y
>>
>> SEQUENCER01 10 1 1 2815 941 0 1
>> NGAACTTTAAGAGTGGTGTGGATGCAGACTCTTCTTATTTTAAAATCTTTA
>> BKIKKUUTTU_____[[[[[[[[[[_b_____b______QQQ__b___b__
>> splice_sites-auto.faDHRS7_50_50_chr14.fa_59681484_59685824 20
>> R A50 200 Y
>>
>>
>>
>>
>>> s1t<-readAligned("./", pattern="s1Test", type="SolexaExport",
>>> filter=cfil) sessionInfo()
>> R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu
>>
>> locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3]
>> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C
>> LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9]
>> LC_ADDRESS=C LC_TELEPHONE=C [11]
>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages: [1] stats graphics grDevices utils
>> datasets methods base
>>
>> other attached packages: [1] ShortRead_1.6.2 Rsamtools_1.0.1
>> lattice_0.19-11 [4] Biostrings_2.16.7 GenomicRanges_1.0.1
>> IRanges_1.6.8
>>
>> loaded via a namespace (and not attached): [1] Biobase_2.8.0
>> grid_2.11.0 hwriter_1.2 tools_2.11.0
>>>
>>
>>
>>
>> ______________________________________________________________________
>>
>>
> The contents of this electronic message, including any attachments, are
> intended only for the use of the individual or entity to which they are
> addressed and may contain confidential information. If you are not the
> intended recipient, you are hereby notified that any use, dissemination,
> distribution, or copying of this message or any attachment is strictly
> prohibited. If you have received this transmission in error, please send
> an e-mail to postmaster at genomichealth.com and delete this message, along
> with any attachments, from your computer.
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________ Bioc-sig-sequencing
>> mailing list Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list