[R] Removing rows if certain elements are found in character string

Rui Barradas ruipbarradas at sapo.pt
Tue Jul 3 10:06:41 CEST 2012


Hello,

Inline.

Em 03-07-2012 01:15, jim holtman escreveu:
> You will have to change the 'i1' expression as follows:
>
>> i1 <- grepl("^([0D]|[0d])*$", dd$ch)
>> i1  # matches strings with d & D in them
>   [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>> # second string had 'd' & 'D' in it so it was TRUE above and FALSE below
>> i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch)
>> i1new
>   [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>

Right, apparently, I forgot that grep is greedy, and the test cases were 
not complete.
>>
>
> I put a 'd' and 'D' in the second string and the original regular
> expression is equivalent to
>
> grepl("^[0dD]*$", dd$ch)
>

This is only for the first request, and does not solve cases where there 
are characters other than '0', 'd' or 'D', but 'd' or 'D' are the first 
non-zero. This is the case of my 4th row, changed from the OP's data 
example.

My regexpr for 'i2' is equivalent to this one, that I believe is more 
readable:


i2b <- grepl("^0{0,}[Dd]", dd$ch)


First a zero, that might occur zero or more times, then a 'd' or 'D', 
then and til the end, irrelevant.

> which will match strings containing d, D and 0.  If you only want 'd'
> or 'D' (and not both), then you will have to use the one in 'i1new'.
>

To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'.

Rui Barradas

> On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>> Hello,
>>
>> Try regular expressions instead.
>> In this data.frame, I've changed row nr.4 to have a row with 'D' as first
>> non-zero character.
>>
>> dd <- read.table(text="
>>
>> ch     count
>> 1  0000000000D0000000000000000000000000000000000000 0.007368
>> 2  0000000000d0000000000000000000000000000000000000 0.002456
>> 3  000000000T00000000000000000000000000000000000000 0.007368
>> 4  000000000DT0000000000000000000000000000000000000 0.007368
>>
>> 5  000000000T00000000000000000000000000000000000000 0.002456
>> 6  000000000Td0000000000000000000000000000000000000 0.002456
>> 7  00000000T000000000000000000000000000000000000000 0.007368
>> 8  00000000T0D0000000000000000000000000000000000000 0.007368
>> 9  00000000T000000000000000000000000000000000000000 0.002456
>> 10 00000000T0d0000000000000000000000000000000000000 0.002456
>> ", header=TRUE)
>> dd
>>
>> i1 <- grepl("^([0D]|[0d])*$", dd$ch)
>> i2 <- grepl("^0*[Dd]", dd$ch)
>>
>> dd[!i1, ]
>> dd[!i2, ]
>> dd[!(i1 | i2), ]
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 02-07-2012 23:48, Claudia Penaloza escreveu:
>>
>>> I would like to remove rows from the following data frame (df) if there
>>> are
>>> only two specific elements found in the df$ch character string (I want to
>>> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like
>>> to remove rows if the first non-zero element is "D" or "d".
>>>
>>>
>>>                                                    ch     count
>>> 1  0000000000D0000000000000000000000000000000000000 0.007368;
>>> 2  0000000000d0000000000000000000000000000000000000 0.002456;
>>> 3  000000000T00000000000000000000000000000000000000 0.007368;
>>> 4  000000000TD0000000000000000000000000000000000000 0.007368;
>>> 5  000000000T00000000000000000000000000000000000000 0.002456;
>>> 6  000000000Td0000000000000000000000000000000000000 0.002456;
>>> 7  00000000T000000000000000000000000000000000000000 0.007368;
>>> 8  00000000T0D0000000000000000000000000000000000000 0.007368;
>>> 9  00000000T000000000000000000000000000000000000000 0.002456;
>>> 10 00000000T0d0000000000000000000000000000000000000 0.002456;
>>>
>>>
>>> I tried the following but it doesn't work if there is more than one
>>> character per string:
>>>
>>>> df <- df[!df$ch %in% c("0","D"),]
>>>> df <- df[!df$ch %in% c("0","d"),]
>>>
>>>
>>> Any help greatly appreciated,
>>> Claudia
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>



More information about the R-help mailing list