[Bioc-sig-seq] drop factor level for seqnames of a GRanges object
Hervé Pagès
hpages at fhcrc.org
Thu Oct 7 02:14:53 CEST 2010
Hi Michael, Chris,
Yes it seems that at the moment seqlengths<-(), the replacement form
of seqlengths(), doesn't let me remove levels, only add new ones:
> gr
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] A [5, 11] * |
[2] B [6, 12] * |
seqlengths
A B C
NA NA NA
> seqlengths(gr) <- seqlengths(gr)[-3]
> gr
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] A [5, 11] * |
[2] B [6, 12] * |
seqlengths
A B C
NA NA NA
> seqlengths(gr) <- c(seqlengths(gr), D=40L)
> gr
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] A [5, 11] * |
[2] B [6, 12] * |
seqlengths
A B C D
NA NA NA 40
I agree with Michael that it would be nice to be able to do
seqlengths(gr) <- seqlengths(gr)[-3]
and have level C actually dropped. And of course
seqlengths(gr) <- seqlengths(gr)[-1]
should fail (one should not be allowed to drop levels that are used).
Cheers,
H.
On 10/06/2010 02:30 PM, Michael Lawrence wrote:
> At the moment, this is extremely difficult and requires obscure hacks to
> achieve. It's been brought up numerous times.
>
> It happens all the time when comparing gene annotations to read alignments.
> The gene annotations from GenomicFeatures include the entire set
> chromosomes, which is nice, but the sequence reads are usually loaded in a
> genome-independent way and only have the chromosomes present in the data. It
> would be best to add the additional chromosomes to the read dataset, even if
> there are no mappings, rather than throw away intervals from the other set.
>
> The main problem is that the seqlengths and seqnames need to updated
> simultaneously. Really one should take precedence over the other, and I
> recently changed the GRanges constructor to favor seqlengths, so that the
> levels of seqnames are set to the names of seqlengths. Thus, it would be
> nice if one could call seqlengths<- and have it add the necessary levels to
> seqnames automatically. Then you can just do:
>
> seqlengths(reads)<- seqlengths(genes)
>
> and be done with it. Of course, if seqnames contains values that are not
> present in the new seqlengths, it should fail.
>
> Comments? Should I make it work this way?
>
> Michael
>
> On Wed, Oct 6, 2010 at 2:10 PM, Chris Seidel<seidel at phaget4.org> wrote:
>
>> Hello,
>>
>> How do I remove a factor level for the sequence names of a GRanges object?
>>
>> Sometimes I work with data sets that have not been aligned to all the
>> same chromosomes. To make them comparable I have to remove or ignore
>> reads for the odd chromosome. So if I have a GRanges object for which I
>> want to remove all reads matching a given chromosome, e.g.:
>>
>> gr<- gr[seqnames(gr) != "chrXHet"]
>>
>> levels(seqnames(gr)) will return all original seqnames, but I would like
>> to exclude "chrXHet" since I've removed all the reads matching that
>> chromosome. How do I do this?
>>
>> -Chris
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-sig-sequencing
mailing list