[Bioc-sig-seq] drop factor level for seqnames of a GRanges object

Hervé Pagès hpages at fhcrc.org
Thu Oct 7 02:14:53 CEST 2010


Hi Michael, Chris,

Yes it seems that at the moment seqlengths<-(), the replacement form
of seqlengths(), doesn't let me remove levels, only add new ones:

 > gr
GRanges with 2 ranges and 0 elementMetadata values
     seqnames    ranges strand |
        <Rle> <IRanges>  <Rle> |
[1]        A   [5, 11]      * |
[2]        B   [6, 12]      * |

seqlengths
   A  B  C
  NA NA NA

 > seqlengths(gr) <- seqlengths(gr)[-3]
 > gr
GRanges with 2 ranges and 0 elementMetadata values
     seqnames    ranges strand |
        <Rle> <IRanges>  <Rle> |
[1]        A   [5, 11]      * |
[2]        B   [6, 12]      * |

seqlengths
   A  B  C
  NA NA NA

 > seqlengths(gr) <- c(seqlengths(gr), D=40L)
 > gr
GRanges with 2 ranges and 0 elementMetadata values
     seqnames    ranges strand |
        <Rle> <IRanges>  <Rle> |
[1]        A   [5, 11]      * |
[2]        B   [6, 12]      * |

seqlengths
   A  B  C  D
  NA NA NA 40

I agree with Michael that it would be nice to be able to do

   seqlengths(gr) <- seqlengths(gr)[-3]

and have level C actually dropped. And of course

   seqlengths(gr) <- seqlengths(gr)[-1]

should fail (one should not be allowed to drop levels that are used).

Cheers,
H.


On 10/06/2010 02:30 PM, Michael Lawrence wrote:
> At the moment, this is extremely difficult and requires obscure hacks to
> achieve. It's been brought up numerous times.
>
> It happens all the time when comparing gene annotations to read alignments.
> The gene annotations from GenomicFeatures include the entire set
> chromosomes, which is nice, but the sequence reads are usually loaded in a
> genome-independent way and only have the chromosomes present in the data. It
> would be best to add the additional chromosomes to the read dataset, even if
> there are no mappings, rather than throw away intervals from the other set.
>
> The main problem is that the seqlengths and seqnames need to updated
> simultaneously.  Really one should take precedence over the other, and I
> recently changed the GRanges constructor to favor seqlengths, so that the
> levels of seqnames are set to the names of seqlengths. Thus, it would be
> nice if one could call seqlengths<- and have it add the necessary levels to
> seqnames automatically.  Then you can just do:
>
> seqlengths(reads)<- seqlengths(genes)
>
> and be done with it. Of course, if seqnames contains values that are not
> present in the new seqlengths, it should fail.
>
> Comments? Should I make it work this way?
>
> Michael
>
> On Wed, Oct 6, 2010 at 2:10 PM, Chris Seidel<seidel at phaget4.org>  wrote:
>
>> Hello,
>>
>> How do I remove a factor level for the sequence names of a GRanges object?
>>
>> Sometimes I work with data sets that have not been aligned to all the
>> same chromosomes. To make them comparable I have to remove or ignore
>> reads for the odd chromosome. So if I have a GRanges object for which I
>> want to remove all reads matching a given chromosome, e.g.:
>>
>> gr<- gr[seqnames(gr) != "chrXHet"]
>>
>> levels(seqnames(gr)) will return all original seqnames, but I would like
>> to exclude "chrXHet" since I've removed all the reads matching that
>> chromosome. How do I do this?
>>
>> -Chris
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list