[Bioc-sig-seq] Reads in 3'utr

Valerie Obenchain vobencha at fhcrc.org
Thu Sep 29 04:56:34 CEST 2011

DESeq and edgeR vignettes.


On 09/28/11 19:46, rohan bareja wrote:
> Hi,
> I have summed the counts now to the gene level for 3'UTR.
> I want to assess the relative amount of each 3’UTR end usage such as 
> what percentage of reads comes from each 3’UTR isoform?
>  I want to identify the different 3’UTR ends for each gene to get 
> alternative 3'UTR  usage(disease vs control)?
> Do you have any idea about how to proceed?
> Thanks,
> Rohan
> --- On *Sat, 24/9/11, Valerie Obenchain /<vobencha at fhcrc.org>/*wrote:
>     From: Valerie Obenchain <vobencha at fhcrc.org>
>     Subject: Re: [Bioc-sig-seq] Reads in 3'utr
>     To: "rohan bareja" <rohan_1925 at yahoo.co.in>
>     Cc: bioc-sig-sequencing at r-project.org
>     Date: Saturday, 24 September, 2011, 4:24 AM
>     On 09/23/2011 02:57 PM, rohan bareja wrote:
>>     Hi,
>>     utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>     So,utr is GRangesList of length 33381
>>     Then as u said,I did the following:
>>     txBygene <- transcriptsBy(txdb, "gene")
>>        geneID <- rep(names(txBygene), elementLengths(txBygene))
>>        df <- data.frame(geneID=geneID,
>>     txID=values(unlist(txBygene))[["tx_id"]])
>>      This gives me a dataframe with 40,780 rows with gene ID and txID
>>     from txBygene object.
>>               geneID  txID
>>     40775   9994 11731
>>     40776   9994 11730
>>     40777   9997 38491
>>     40778   9997 38489
>>     40779   9997 38496
>>     40780   9997 38497
>>     Since my utr object is of length 33,381 ,my counts length is same
>>     i.e 33,381
>>     So I am not able to map the counts to the above data frame which
>>     has transcript and gene IDs.
>     Yes, these lengths are different.
>     In this example we have utr regions from 58 transcripts.
>     > length(utr)
>     [1] 58
>     Those 58 transcripts can be matched to their gene ID's by looking
>     at the txBygene object. All of the transcripts fall into one (or
>     more) of 51 genes,
>     > length(txBygene)
>     [1] 51
>     There are multiple transcripts per gene so we expand the gene ID's
>     to map to the transcripts.
>     > dim(df)
>     [1] 79  2
>     This data.frame has all transcripts from the txdb mapped to the
>     gene ID's. Your utr data may contain only a subset of these
>     transcripts. That is something you need to check.  Match the
>     desired transcript names to the df, pull out the gene IDs. You
>     then have the gene ID's for your utr regions and can split or
>     group your counts by gene.
>     Valerie
>>     --- On *Fri, 23/9/11, Valerie Obenchain /<vobencha at fhcrc.org>
>>     </mc/compose?to=vobencha at fhcrc.org>/*wrote:
>>         From: Valerie Obenchain <vobencha at fhcrc.org>
>>         </mc/compose?to=vobencha at fhcrc.org>
>>         Subject: Re: [Bioc-sig-seq] Reads in 3'utr
>>         To: "rohan bareja" <rohan_1925 at yahoo.co.in>
>>         </mc/compose?to=rohan_1925 at yahoo.co.in>
>>         Cc: bioc-sig-sequencing at r-project.org
>>         </mc/compose?to=bioc-sig-sequencing at r-project.org>
>>         Date: Friday, 23 September, 2011, 10:50 PM
>>         Hi Rohan,
>>         You can relate the counts for 3UTR regions to gene IDs
>>         through the transcript IDs.
>>             txdb_file <- system.file("extdata",
>>         "UCSC_knownGene_sample.sqlite", package="GenomicFeatures")
>>             txdb <- loadFeatures(txdb_file)
>>             utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>         The transcript names can be matched to the gene ID's through,
>>             txBygene <- transcriptsBy(txdb, "gene")
>>             geneID <- rep(names(txBygene), elementLengths(txBygene))
>>             df <- data.frame(geneID=geneID,
>>         txID=values(unlist(txBygene))[["tx_id"]])
>>         Now you know what gene ID each tx count belongs to. You can
>>         split your counts by gene ID ...
>>         Valerie
>>         On 09/20/2011 12:13 PM, rohan bareja wrote:
>>>         Hi everyone,
>>>         I am doing NGS analysis using bam files.I have counted reads in 3'utr region using 
>>>         utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>>         countsUTR<- countOverlaps(utr,reads)
>>>         I have got the transcript level counts from this.How can I get the gene level counts??It might sound silly but Does anybody have an idea on what type of anaylses we can do from this countsUTR ?
>>>         Thanks,Rohan
>>>         	[[alternative HTML version deleted]]
>>>         _______________________________________________
>>>         Bioc-sig-sequencing mailing list
>>>         Bioc-sig-sequencing at r-project.org
>>>         https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

More information about the Bioc-sig-sequencing mailing list