[R] Logical statements and subseting data...
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Mon Feb 25 15:28:33 CET 2008
The negation of Height.1 == 0 & Height.2 == 0 was incorrect. Use
subset(raw.all.clean, !(Height.1 == 0 & Height.2 == 0))
or
subset(raw.all.clean, Height.1 != 0 | Height.2 != 0)
HTH,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be
Do not put your faith in what statistics say until you have carefully
considered what they do not say. ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney
-----Oorspronkelijk bericht-----
Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org]
Namens Neil Shephard
Verzonden: maandag 25 februari 2008 15:21
Aan: r-help
Onderwerp: [R] Logical statements and subseting data...
Hi,
I'm scratching my head as to why I can't use the subset() command to
remove one line of data from a data frame.
There is just one row (out of 45840) that I'd like to remove and it
can be identified using....
> dim(raw.all.clean)
[1] 45840 10
> subset(raw.all.clean, Height.1 == 0 & Height.2 == 0)
Sample.Name Well SNP Allele.1 Allele.2 Size.1 Size.2
Height.1
47068 CA0153 O02 rs2106776 NA NA
0
Height.2 Pool
47068 0 3
(Note that the row index of 47068 which is higher than the rows
reported by dim() is simply because I have already removed a number of
rows).
So I want to remove this one instance where Height.1 == 0 & Height.2
== 0. I'd have thought that a logical expression where Height.1 != 0
& Height.2 != 0 would have achieved this, but it doesn't seem to
correctly drop out this one observation, instead its dropping out far
more observations...
> t <- subset(raw.all.clean, Height.1 != 0 & Height.2 != 0)
> dim(t)
[1] 38150 10
Thus 7690 rows have been removed. It seems to be that the '&'
operator is being interparated as an 'OR' (|) since...
> dim(subset(raw.all.clean, Height.1 != 0))
[1] 42152 10
> dim(subset(raw.all.clean, Height.2 != 0))
[1] 41837 10
...and...
> dim(raw.all.clean) - dim(subset(raw.all.clean, Height.1 != 0))
[1] 3688 0
> dim(raw.all.clean) - dim(subset(raw.all.clean, Height.2 != 0))
[1] 4003 0
> 3688 + 4003
[1] 7691
(This is one more than the number of rows being removed, but given
that there is one sample where both Height.1 and Height.2 are '0'
thats fine).
I thought I understood how logical expressions are constructed, and
have gone back and read the entries on precedence, but can't work out
why the above is happening?
Whats particularly perplexing (to me) is that the test for exact
equality works, but not for inequality?
I feel like I'm missing something blatantly obvious, but can't work
out what it is.
Cheers,
Neil
--
Email - nshephard op gmail.com / n.shephard op sheffield.ac.uk
______________________________________________
R-help op r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list