[R] Removing rows if certain elements are found in character string

Rui Barradas ruipbarradas at sapo.pt
Tue Jul 3 18:14:25 CEST 2012


Hello,

I'm glad it helped. See answer inline.

Em 03-07-2012 17:09, Claudia Penaloza escreveu:
> Thank you Rui and Jim, both 'i1' and 'i1new' worked perfectly
> because there are no instances of 'Dd' or 'dD' in the data set (that I
> would/not want to include/exclude)... but I understand that 'i1new'
> targets precisely what I want.
> Why isn't a leader of zero's required for either 'i1' or 'i1new', as so?
> i1newer <- grepl("^0{0,}[D]*$|^0{0,}[d]*$", dd$ch)
>

Because both 'i1' and 'i1new' test from beginning to end of string, 
allowing only '0' and either 'd' or 'D', but not both (i1new).

So, there's no need to explicitly test for a string that begins with '0'.

Rui Barradas

> Thank you again,
> Claudia
> On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <ruipbarradas at sapo.pt
> <mailto:ruipbarradas at sapo.pt>> wrote:
>
>     Hello,
>
>     Inline.
>
>     Em 03-07-2012 01:15, jim holtman escreveu:
>
>         You will have to change the 'i1' expression as follows:
>
>             i1 <- grepl("^([0D]|[0d])*$", dd$ch)
>             i1  # matches strings with d & D in them
>
>            [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>             # second string had 'd' & 'D' in it so it was TRUE above and
>             FALSE below
>             i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch)
>             i1new
>
>            [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
>
>     Right, apparently, I forgot that grep is greedy, and the test cases
>     were not complete.
>
>
>
>         I put a 'd' and 'D' in the second string and the original regular
>         expression is equivalent to
>
>         grepl("^[0dD]*$", dd$ch)
>
>
>     This is only for the first request, and does not solve cases where
>     there are characters other than '0', 'd' or 'D', but 'd' or 'D' are
>     the first non-zero. This is the case of my 4th row, changed from the
>     OP's data example.
>
>     My regexpr for 'i2' is equivalent to this one, that I believe is
>     more readable:
>
>
>     i2b <- grepl("^0{0,}[Dd]", dd$ch)
>
>
>     First a zero, that might occur zero or more times, then a 'd' or
>     'D', then and til the end, irrelevant.
>
>
>         which will match strings containing d, D and 0.  If you only
>         want 'd'
>         or 'D' (and not both), then you will have to use the one in 'i1new'.
>
>
>     To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'.
>
>     Rui Barradas
>
>
>         On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas
>         <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
>
>             Hello,
>
>             Try regular expressions instead.
>             In this data.frame, I've changed row nr.4 to have a row with
>             'D' as first
>             non-zero character.
>
>             dd <- read.table(text="
>
>             ch     count
>             1  0000000000D0000000000000000000__000000000000000000 0.007368
>             2  0000000000d0000000000000000000__000000000000000000 0.002456
>             3  000000000T00000000000000000000__000000000000000000 0.007368
>             4  000000000DT0000000000000000000__000000000000000000 0.007368
>
>             5  000000000T00000000000000000000__000000000000000000 0.002456
>             6  000000000Td0000000000000000000__000000000000000000 0.002456
>             7  00000000T000000000000000000000__000000000000000000 0.007368
>             8  00000000T0D0000000000000000000__000000000000000000 0.007368
>             9  00000000T000000000000000000000__000000000000000000 0.002456
>             10 00000000T0d0000000000000000000__000000000000000000 0.002456
>             ", header=TRUE)
>             dd
>
>             i1 <- grepl("^([0D]|[0d])*$", dd$ch)
>             i2 <- grepl("^0*[Dd]", dd$ch)
>
>             dd[!i1, ]
>             dd[!i2, ]
>             dd[!(i1 | i2), ]
>
>
>             Hope this helps,
>
>             Rui Barradas
>
>             Em 02-07-2012 23:48, Claudia Penaloza escreveu:
>
>                 I would like to remove rows from the following data
>                 frame (df) if there
>                 are
>                 only two specific elements found in the df$ch character
>                 string (I want to
>                 remove rows with only "0" & "D" or "0" & "d").
>                 Alternatively, I would like
>                 to remove rows if the first non-zero element is "D" or "d".
>
>
>                                                                     ch
>                    count
>                 1  0000000000D0000000000000000000__000000000000000000
>                 0.007368;
>                 2  0000000000d0000000000000000000__000000000000000000
>                 0.002456;
>                 3  000000000T00000000000000000000__000000000000000000
>                 0.007368;
>                 4  000000000TD0000000000000000000__000000000000000000
>                 0.007368;
>                 5  000000000T00000000000000000000__000000000000000000
>                 0.002456;
>                 6  000000000Td0000000000000000000__000000000000000000
>                 0.002456;
>                 7  00000000T000000000000000000000__000000000000000000
>                 0.007368;
>                 8  00000000T0D0000000000000000000__000000000000000000
>                 0.007368;
>                 9  00000000T000000000000000000000__000000000000000000
>                 0.002456;
>                 10 00000000T0d0000000000000000000__000000000000000000
>                 0.002456;
>
>
>                 I tried the following but it doesn't work if there is
>                 more than one
>                 character per string:
>
>                     df <- df[!df$ch %in% c("0","D"),]
>                     df <- df[!df$ch %in% c("0","d"),]
>
>
>
>                 Any help greatly appreciated,
>                 Claudia
>
>                           [[alternative HTML version deleted]]
>
>                 ________________________________________________
>                 R-help at r-project.org <mailto:R-help at r-project.org>
>                 mailing list
>                 https://stat.ethz.ch/mailman/__listinfo/r-help
>                 <https://stat.ethz.ch/mailman/listinfo/r-help>
>                 PLEASE do read the posting guide
>                 http://www.R-project.org/__posting-guide.html
>                 <http://www.R-project.org/posting-guide.html>
>                 and provide commented, minimal, self-contained,
>                 reproducible code.
>
>
>             ________________________________________________
>             R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>             https://stat.ethz.ch/mailman/__listinfo/r-help
>             <https://stat.ethz.ch/mailman/listinfo/r-help>
>             PLEASE do read the posting guide
>             http://www.R-project.org/__posting-guide.html
>             <http://www.R-project.org/posting-guide.html>
>             and provide commented, minimal, self-contained, reproducible
>             code.
>
>
>
>
>
>



More information about the R-help mailing list