[R] How to find series of small numbers in a big vector?
    jim holtman 
    jholtman at gmail.com
       
    Tue Jan 30 13:23:38 CET 2007
    
    
  
You can use 'rle'
> search_range <- c (0.021, 0.029) # inclusive searching
> search_length <- 5   # find ALL series of 5 members within search_range
> my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900,
+             0.900, 0.900, 0.900, 0.900, 0.900,
+             0.900, 0.028, 0.024, 0.027, 0.023,
+             0.022, 0.900, 0.900, 0.900, 0.900,
+             0.900, 0.900, 0.024, 0.029, 0.023,
+             0.025, 0.026, 0.900, 0.900, 0.900,
+             0.900, 0.900, 0.900, 0.900, 0.900,
+             0.900, 0.900, 0.900, 0.900, 0.022,
+             0.023, 0.025, 0.333, 0.027, 0.028,
+             0.900, 0.900, 0.900, 0.900, 0.900)
> # create vector of values within range
> series <- (my_data >= search_range[1]) & (my_data <= search_range[2])
> # determine the 'runs'
> runs <- rle(series)
> # find runs that meet criteria
> long_runs <- which((runs$lengths >= search_length) & (runs$values))
> # create dataframe of indices
> series <- data.frame(start=cumsum(runs$lengths)[long_runs] - runs$lengths[long_runs] + 1,
+     end=cumsum(runs$lengths)[long_runs])
> series
  start end
1    12  16
2    23  27
>
On 1/30/07, Jonne Zutt <j.zutt at tudelft.nl> wrote:
> I suggest the following appraoch
>
> This gives TRUE for all data within the search_range
>        A1 = my_data > search_range[1] & my_data < search_range[2]
>
> which() gives us the indices
>        A2 = which(A1)
>
> and diff() the gaps between those intervals
>        A3 = diff(A2)
>
> Hence, if A3 > search_length, we have enough consecutive numbers within
> the search range
>
> Finally, this is what you wanted to know?
>
>        A2[ which(A3 > search_length) ]
>
>
> On Mon, 2007-01-29 at 17:49 -0800, Ed Holdgate wrote:
> > Hello:
> >
> > I have a vector with 120,000 reals
> > between 0.00000 and 0.9999
> >
> > They are not sorted but the vector index is the
> > time-order of my measurements, and therefore
> > cannot be lost.
> >
> > How do I use R to find the starting and ending
> > index of ANY and ALL the "series" or "sequences"
> > in that vector where ever there are 5 or more
> > members in a row between 0.021 and 0.029 ?
> >
> > For example:
> >
> > search_range <- c (0.021, 0.029) # inclusive searching
> > search_length <- 5   # find ALL series of 5 members within search_range
> > my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900,
> >              0.900, 0.900, 0.900, 0.900, 0.900,
> >              0.900, 0.028, 0.024, 0.027, 0.023,
> >              0.022, 0.900, 0.900, 0.900, 0.900,
> >              0.900, 0.900, 0.024, 0.029, 0.023,
> >              0.025, 0.026, 0.900, 0.900, 0.900,
> >              0.900, 0.900, 0.900, 0.900, 0.900,
> >              0.900, 0.900, 0.900, 0.900, 0.022,
> >              0.023, 0.025, 0.333, 0.027, 0.028,
> >              0.900, 0.900, 0.900, 0.900, 0.900)
> >
> > I seek the R program to report:
> > start_index of 12 and an end_index of 16
> > -- and also --
> > start_index of 23 and an end_index of 27
> > because that is were there happens to be
> > search_length numbers within my search_range.
> >
> > It should _not_ report the series at start_index 40
> > because that 0.333 in there violates the search_range.
> >
> > I could brute-force hard-code an R program, but
> > perhaps an expert can give me a tip for an
> > easy, elegant existing function or a tactic
> > to approach?
> >
> > Execution speed or algorithm performance is not,
> > for me in this case, important.  Rather, I
> > seek an easy R solution to find the time windows
> > (starting & ending indicies) where 5 or more
> > small numbers in my search_range were measured
> > all in a row.
> >
> > Advice welcome and many thanks in advance.
> >
> > Ed Holdgate
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
    
    
More information about the R-help
mailing list