[Bioc-sig-seq] 'coverage' error message (2)
Kasper Daniel Hansen
khansen at stat.berkeley.edu
Fri Jan 8 18:28:04 CET 2010
I have only spend 5s reading your email.
But it seems you have the incredible common problem of chromosome name mismatch. For a given organism there is no standard for how the chr are named. Depending on the source (genome, what source, what annotation etc), I have -- for the same genome -- seen at least
chr1
chr01
chrI (<- roman 1)
scchr1
I would not find it strange at all if whoever aligned using ELAND used a different chr. name. This is pretty hard to make work automatically. You usually have to modify the objects yourself.
Kasper
On Jan 7, 2010, at 15:59 PM, pterry at huskers.unl.edu wrote:
> Dear bioc-sig-sequencing,
>
> I am trying to analyze Eland aligned files (Arabidopsis) for differential expression, using as a guide the 'A ChIP-Seq Data Analysis' handout from a 11/19/09 session at the 'High throughput sequence analysis tools and approaches with Bioconductor' workshop in Seattle.
>
> I generated the error message in the following output. Can you comment?
>
> Notes:
>
> i. This is my second email on this problem. My first email omitted some info which appears to me may have affected the response by persons monitoring the mailing list.
>
> ii. One misunderstanding would appear to be that I used the mouse 'reads' data supplied during the lab. Instead, I used what I'm told is Eland-aligned Arabidopsis data (s_8_export_chr1.txt file derived from s_8_export.txt using grep).
>
> iii. The error message suggests a mismatch between:
>> names(arab.chromlens)
> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chrC" "chrM"
> and
>> levels(chromosome(alns_8))
> [1] "chr1.fas"
>
> I don't know what to do about this 'mismatch'? Perhaps I need to arrange so:
>> names(arab.chromlens)
> gives output of only:
> "chr1"?
>
> iv. I note using 'available.genomes()', there are two BSgenome data packages for Arabidopsis. Could my arbitrary choice be a problem? Would one have to coordinate the choice with the Arabidopsis genome Eland must have used during alignment?
>
>
> ...
>
>> cerudataDir <- "/home/mterry/data09/wang_892spr09/rob_hw6/ceru_data"
>> cerudataDir
> [1] "/home/mterry/data09/wang_892spr09/rob_hw6/ceru_data"
>
>> pattern <- "s_8_export_chr1.txt"
>> list.files(cerudataDir, pattern)
> [1] "s_8_export_chr1.txt"
>> filt1 <- alignDataFilter(expression(filtering=="Y"))
>> filt2 <- chromosomeFilter("chr[0-9XYM]+.fa")
>> filt <- compose(filt1, filt2)
>> alns_8 <- readAligned(cerudataDir, pattern, type="SolexaExport",
> + filter=filt)
>> alns_8
> class: AlignedRead
> length: 1022848 reads; width: 35 cycles
> chromosome: chr1.fas chr1.fas ... chr1.fas chr1.fas
> position: 7568294 167488 ... 4687256 5376960
> strand: + + ... + +
> alignQuality: NumericQuality
> alignData varLabels: run lane ... filtering contig
>
>> levels(alns_8 at chromosome) <- sub(".fa$", "", levels(chromosome(alns_8)))
>
>> head(levels(alns_8 at chromosome))
> [1] "chr1.fas"
>> levels(chromosome(alns_8))
> [1] "chr1.fas"
>> library(BSgenome.Athaliana.TAIR.04232008)
>> arab.chromlens <- seqlengths(Athaliana)
>> head(arab.chromlens)
> chr1 chr2 chr3 chr4 chr5 chrC
> 30432563 19705359 23470805 18585042 26992728 154478
>> names(arab.chromlens)
> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chrC" "chrM"
>> cov.arab8 <- coverage(alns_8, width = arab.chromlens, extend = 126L)
> Error: UserArgumentMismatch
> 'names(width)' (or 'names(end)') mismatch with 'levels(chromosome(x))'
> see ?"AlignedRead-class"
>
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BSgenome.Athaliana.TAIR.04232008_1.3.16
> [2] chipseq_0.2.1
> [3] ShortRead_1.4.0
> [4] lattice_0.17-26
> [5] BSgenome_1.14.2
> [6] Biostrings_2.14.10
> [7] IRanges_1.4.9
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 grid_2.10.1 hwriter_1.1
>>
>
>
> Thanks,
> P. Terry
> pterry at huskers.unl.edu
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list