[BioC] problems with strand in predictCoding
Valerie Obenchain
vobencha at fhcrc.org
Thu Apr 26 08:59:25 CEST 2012
Hi all,
Thanks for the feedback and discussion on this topic. A patch for
predictCoding() has been applied to 1.2.7 release/1.3.9 devel. In the
devel branch I've created a subfunction of predictCoding() for more
convenient testing. It accepts a GRangesList (expected to be the
cdsbytx). The behavior is as follows,
library(BSgenome.Hsapiens.UCSC.hg19)
fun <- VariantAnnotation:::.predictCodingGRangesList
cdsbytx <- GRangesList(tx1=GRanges(seqnames="chr1",
IRanges(c(10001, 10010), width=5),
strand="+",
cds_rank=c(1, 2)),
tx2=GRanges(seqnames="chr1",
IRanges(c(10100, 10001), width=5),
strand="-",
cds_rank=c(1,2)))
variant=DNAStringSet(c("G", "G", "C", "T", "G"))
query <- GRanges(seqnames="chr1",
ranges=IRanges(c(rep(10003, 3), 10011,
10101), width=1),
strand=c("+", "-", "*", "*", "*"),
variant=variant)
names(query) <- LETTERS[1:5]
coding <- fun(query, cdsbytx, Hsapiens, variant)
> coding
GRanges with 6 ranges and 13 elementMetadata cols:
seqnames ranges strand | variant varAllele
CDSLOC
<Rle> <IRanges> <Rle> | <DNAStringSet> <DNAStringSet> <IRanges>
A chr1 [10003, 10003] + | G G
[3, 3]
B chr1 [10003, 10003] - | G C
[8, 8]
C chr1 [10003, 10003] + | C C
[3, 3]
C chr1 [10003, 10003] - | C G
[8, 8]
D chr1 [10011, 10011] + | T T
[7, 7]
E chr1 [10101, 10101] - | G C
[4, 4]
PROTEINLOC QUERYID TXID CDSID GENEID
<CompressedIntegerList> <integer> <character> <character> <character>
A 1 1 tx1 <NA> <NA>
B 3 2 tx2 <NA> <NA>
C 1 3 tx1 <NA> <NA>
C 3 3 tx2 <NA> <NA>
D 3 4 tx1 <NA> <NA>
E 2 5 tx2 <NA> <NA>
CONSEQUENCE REFCODON VARCODON REFAA VARAA
<factor> <DNAStringSet> <DNAStringSet> <AAStringSet> <AAStringSet>
A synonymous TAA TAG * *
B nonsynonymous GTT GCT V A
C nonsynonymous TAA TAC * Y
C nonsynonymous GTT GGT V G
D nonsynonymous CCT TCT P S
E nonsynonymous GGG CGG G R
Note the strand of the output reflects the strand of the subject that
was hit. query "C" is unstranded and hits a subject on both the "+" and
"-" strand.
A few more changes you should be aware of in the devel branch,
- Column names of the output have changed to be consistent with select()
in AnnotationDbi.
- If the variant is nonsynonymous and the VARCODON has a stop codon not
present in the REFCODON the CONSEQUENCE is now 'nonsense'
instead of 'nonsynonymous'. I'm working on annotating the loss of start
codons as well.
Valerie
On 04/25/12 08:28, Valerie Obenchain wrote:
> On 04/25/2012 12:28 AM, Bernd wrote:
>> Hi there,
>>
>> Jeremiah Degenhardt<degenhardt.jeremiah at ...> writes:
>>
>>> Hello BioC,
>>>
>>> I am using the predictCoding function in the VariantAnnotation package
>>> and have run into a few issues.
>>>
>>> The first issue that I found is a small one. While I can supply any
>>> txdb object that I want to the predictCoding, the function seems to
>>> still be looking for TxDb.Hsapiens.UCSC.hg19.knownGene. Running
>>> predict coding without this package installed results in the following
>>> error:
>>>
>>> Error in isCircular(TxDb.Hsapiens.UCSC.hg19.knownGene) :
>>> error in evaluating the argument 'x' in selecting a method for
>>> function 'isCircular': Error: object
>>> ''
>> Since I am using the library TxDb.Mmusculus.UCSC.mm9.knownGene this
>> is a problem
>> for my analysis of snps in mice. I get the same error by using this
>> code:
>>
>> library(BSgenome.Mmusculus.UCSC.mm9)
>> library(TxDb.Mmusculus.UCSC.mm9.knownGene)
>>
>> txdb = TxDb.Mmusculus.UCSC.mm9.knownGene
>>
>> snp1 = GRanges(seqnames="chr2", ranges=IRanges(start=28920573, width=1),
>> strand="+", varAllele=DNAStringSet("A"))
>>
>> predictCoding(query=snp1, subject=txdb, seqSource=Mmusculus,
>> varAllele=values(snp1)$varAllele)
>>
>> Is there any solution?
>> Tanks a lot!
>>
>
> The txdb error has been fixed in VariantAnnotation 1.2.6 in release
> and 1.3.8 in devel. I'm currently working on the predictCoding strand
> issue and will post back here when it's fixed.
>
> Valerie
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list