[BioC] IlluminaMousev2.db probe quality information questions?
Mark Dunning
mark.dunning at gmail.com
Thu Mar 1 17:48:20 CET 2012
Hi Lourdes,
Sorry for taking so long to get back to you. Went away for a few days
and somehow managed to miss your message
Thanks for your interest in the packages! The probe quality scores are
derived from our mapping of probes to the genome and the transcriptome
using an in-house perl script. The *'s indicate issues in
consolidating the genomic and transcriptomic matches. Here is the full
explanation;
"Perfect/Good*** no CDS annotation - this can occur where there all the
transcript alignment matches are to the reverse strand and/or are GenBank
entries for which we have no 5pUTR/3pUTR/CDS annotation."
i.e the probe was found to match a transcript, but there is
insufficient information to class it as 3pUTR/5pUTR. The transcript
may be unreliable.
Perfect/Good**** mismatches for transcript alignment to the genome -
mismatches for transcript alignments to the genome are taken from the UCSC
annotations tables refSeqAli and all_mrna; **** is attached to the probe
quality is Perfect or Good and the genomics coordinates for the best match
from a BLAST search against the transcript databases and that from a BLAST
search against the reference genome differ and there is a mismatch in the
transcript alignment to the genome.
i,e the probe matches a transcript, but the transcript does not map to
the genomic location that we expect.
The missing Probe Quality values for those probes are accidental. The
source file I use to compile the annotation packages is as follows
grep ILMN_1229593 Annotation_Illumina_Mouse_WG-6_V2_mm9_Sept2011.txt
ILMN_1229593 AACTGGCCCACCTTCAACACTCCCTCTAGGCACCCAGACCTCTAGTGGCA 50 chr15:63942585:63942634:- 15qD1 0 1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100 100 NM_010026 1
of 1 (Asap1) uc007vzk.1 uc007vzj.1 uc007vzi.1 uc007vzh.1 4 of 6
(Asap1) BC094581 BC048818 BC002201 AK122477 AF075461 AK147689 6 of 381
(Asap1)6 X 6 6 6 6 6 6 6 6 7 ENSMUST00000110115 ENSMUST00000023008 2
of 3 (ENSMUSG00000022377) 65301463 63101607 28981428 12805456 28972685
4063613 74188670 NP_034156.2 Q9QWY8 Q9QWY8 No 1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100 100 U92478 1-50
|||||||||||||||||||||||||
|||||||||||||||||||||||| 50 98 98 Asap1 ENSMUSG00000022377 Mm.27723613196 ArfGAP
with SH# domain, ankyrin repeat and PH
domain1 Yes Transcriptomic Yes 58 0 Perfect 006280286
grep ILMN_2694153 Annotation_Illumina_Mouse_WG-6_V2_mm9_Sept2011.txt
ILMN_2694153 GTTTAGATGAGTGGGTTTGTACATCTTATGGCGAGTGGCCACCCCTGAGA 50 chr15:63920345:63920394:- 15qD1 0 1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100 100 NM_010026 1
of 1 (Asap1) uc007vzm.1 uc007vzl.1 uc007vzk.1 uc007vzj.1 uc007vzi.1
uc007vzh.1 6 of 6 (Asap1) U92478 BC094581 BC048818 BC002201 AK122477
AF075462 AF075461 AK166056 AK159048 AK146545 BB821218 AK147689 11 of
381 (Asap1) 1 1 1 X 1 1 1 1 1 1 1 1 1 1 X 1 X X 1 ENSMUST00000110114
ENSMUST00000110115 ENSMUST00000023008 3 of 3
(ENSMUSG00000022377) 65301463 1928965 63101607 28981428 12805456
28972685 4063615 4063613 74141548 74186632 74138896 16993847
74188670 NP_034156.2 Q9QWY8 Q9QWY8 Q9QWY8 Q9QWY8 No 1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100 100 Asap1 ENSMUSG00000022377 Mm.27723613196 ArfGAP
with SH# domain, ankyrin repeat and PH
domain1 Yes Transcriptomic Yes 50 0 Perfect 001010528
There is a # character in the description and by default R thinks that
everything that follows is a comment and so doesn't read them in. I
shall correct this in future versions of the annotation. Thanks for
spotting this. Both probes are Perfect btw.
Regards,
Mark
On Tue, Feb 14, 2012 at 7:01 PM, Lourdes Peña Castillo
<lourdes.pena at gmail.com> wrote:
> Hello,
>
> I am using the re-annotation of Illumina probe sequences available in the
> IlluminaMousev2.db (great package!), and I have two questions (please see
> code below as well):
>
> 1) Is there any difference between Good and Good*** or Perfect and
> Perfect**** probe quality?
>
> 2) I noticed there are two probes re-annotated to an EntrezID without probe
> quality, why would this be?
>
> Thanks!
>
> Lourdes
>
>> library("illuminaMousev2.db")
>
>> x <- illuminaMousev2ENTREZREANNOTATED
>
>> mapped_probes <- mappedkeys(x)
>
>> xx <- as.list(x[mapped_probes])
>
>> probe_EntrezID_re <- unlist(xx)
>
>>
>
>> x <- illuminaMousev2PROBEQUALITY
>
>> mapped_probes <- mappedkeys(x)
>
>> # Convert to a list
>
>> xx <- as.list(x[mapped_probes])
>
>> probe_quality_re <- unlist(xx)
>
>>
>
>> table(probe_quality_re[intersect(names(probe_EntrezID_re),
> names(probe_quality_re))])
>
>
> Bad Good Good*** Good**** No match Perfect
> Perfect*** Perfect****
>
> 3657 996 38 302 79 31819
> 1719 1047
>
>>
>
>> setdiff(names(probe_EntrezID_re), names(probe_quality_re))
>
> [1] "ILMN_1229593" "ILMN_2694153"
>
>> probe_quality_re[c("ILMN_1229593", "ILMN_2694153")]
>
> <NA> <NA>
>
> NA NA
>
>>
>
>> sessionInfo()
>
> R version 2.14.1 (2011-12-22)
>
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
>
> locale:
>
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
>
> attached base packages:
>
> [1] grid stats graphics grDevices utils datasets methods
> base
>
>
> other attached packages:
>
> [1] gplots_2.10.1 KernSmooth_2.23-7 caTools_1.12
> bitops_1.0-4.1
>
> [5] gdata_2.8.2 gtools_2.6.2 limma_3.10.2
> illuminaMousev2.db_1.12.1
>
> [9] org.Mm.eg.db_2.6.4 RSQLite_0.11.1 DBI_0.2-5
> AnnotationDbi_1.16.15
>
> [13] Biobase_2.14.0 BiocInstaller_1.2.1
>
>
> loaded via a namespace (and not attached):
>
> [1] IRanges_1.12.6 tools_2.14.1
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list