[Bioc-sig-seq] Reads in 3'utr
Valerie Obenchain
vobencha at fhcrc.org
Tue Sep 27 04:45:13 CEST 2011
On 09/26/11 09:16, rohan bareja wrote:
Yes, you need to merge the counts for each gene.
see ?split
Split the counts on the gene IDs then sum with lapply. Something like,
lapply(split(df$counts, df$geneID), sum)
Valerie
> Hi Valerie,
>
> Thanks a lot..It worked finally.. So now I have a data frame for the
> geneIds ,TranscriptIds and the counts (3'utr) which is given below:
>
> GENE TX countsUTRctl
> [1,] "148398" "1121" "2"
> [2,] "339451" "1118" "0"
> [3,]"84069" "1116" "0"
> [4,] "84069" "1119" "11"
> [5,] "9636" "1126" "11"
> [6,] "375790" "1127" "0"
>
> Now I want to do differential expression of genes using DESeq,so do I
> have to merge the two same genes and its counts such as geneID 84069
> (from above ) or i can proceed with the above dataframe?
> If I have to merge them how do I do that?
>
> Thanks,
> Rohan
>
> --- On *Sat, 24/9/11, Valerie Obenchain /<vobencha at fhcrc.org>/* wrote:
>
>
> From: Valerie Obenchain <vobencha at fhcrc.org>
> Subject: Re: [Bioc-sig-seq] Reads in 3'utr
> To: "rohan bareja" <rohan_1925 at yahoo.co.in>
> Cc: bioc-sig-sequencing at r-project.org
> Date: Saturday, 24 September, 2011, 4:24 AM
>
> On 09/23/2011 02:57 PM, rohan bareja wrote:
>> Hi,
>>
>> utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>> So,utr is GRangesList of length 33381
>> Then as u said,I did the following:
>>
>> txBygene <- transcriptsBy(txdb, "gene")
>> geneID <- rep(names(txBygene), elementLengths(txBygene))
>> df <- data.frame(geneID=geneID,
>> txID=values(unlist(txBygene))[["tx_id"]])
>>
>> This gives me a dataframe with 40,780 rows with gene ID and txID
>> from txBygene object.
>> geneID txID
>> 40775 9994 11731
>> 40776 9994 11730
>> 40777 9997 38491
>> 40778 9997 38489
>> 40779 9997 38496
>> 40780 9997 38497
>>
>> Since my utr object is of length 33,381 ,my counts length is same
>> i.e 33,381
>> So I am not able to map the counts to the above data frame which
>> has transcript and gene IDs.
>>
>
> Yes, these lengths are different.
>
> In this example we have utr regions from 58 transcripts.
>
> > length(utr)
> [1] 58
>
>
> Those 58 transcripts can be matched to their gene ID's by looking
> at the txBygene object. All of the transcripts fall into one (or
> more) of 51 genes,
>
> > length(txBygene)
> [1] 51
>
> There are multiple transcripts per gene so we expand the gene ID's
> to map to the transcripts.
>
> > dim(df)
> [1] 79 2
>
> This data.frame has all transcripts from the txdb mapped to the
> gene ID's. Your utr data may contain only a subset of these
> transcripts. That is something you need to check. Match the
> desired transcript names to the df, pull out the gene IDs. You
> then have the gene ID's for your utr regions and can split or
> group your counts by gene.
>
> Valerie
>>
>>
>>
>> --- On *Fri, 23/9/11, Valerie Obenchain /<vobencha at fhcrc.org>
>> </mc/compose?to=vobencha at fhcrc.org>/*wrote:
>>
>>
>> From: Valerie Obenchain <vobencha at fhcrc.org>
>> </mc/compose?to=vobencha at fhcrc.org>
>> Subject: Re: [Bioc-sig-seq] Reads in 3'utr
>> To: "rohan bareja" <rohan_1925 at yahoo.co.in>
>> </mc/compose?to=rohan_1925 at yahoo.co.in>
>> Cc: bioc-sig-sequencing at r-project.org
>> </mc/compose?to=bioc-sig-sequencing at r-project.org>
>> Date: Friday, 23 September, 2011, 10:50 PM
>>
>> Hi Rohan,
>>
>> You can relate the counts for 3UTR regions to gene IDs
>> through the transcript IDs.
>>
>> txdb_file <- system.file("extdata",
>> "UCSC_knownGene_sample.sqlite", package="GenomicFeatures")
>> txdb <- loadFeatures(txdb_file)
>> utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>
>>
>> The transcript names can be matched to the gene ID's through,
>>
>> txBygene <- transcriptsBy(txdb, "gene")
>> geneID <- rep(names(txBygene), elementLengths(txBygene))
>> df <- data.frame(geneID=geneID,
>> txID=values(unlist(txBygene))[["tx_id"]])
>>
>> Now you know what gene ID each tx count belongs to. You can
>> split your counts by gene ID ...
>>
>>
>> Valerie
>>
>>
>>
>> On 09/20/2011 12:13 PM, rohan bareja wrote:
>>> Hi everyone,
>>> I am doing NGS analysis using bam files.I have counted reads in 3'utr region using
>>> utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>> countsUTR<- countOverlaps(utr,reads)
>>> I have got the transcript level counts from this.How can I get the gene level counts??It might sound silly but Does anybody have an idea on what type of anaylses we can do from this countsUTR ?
>>> Thanks,Rohan
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
More information about the Bioc-sig-sequencing
mailing list