[R] Making a table: collapsing across sub-strings

jim holtman jholtman at gmail.com
Thu Oct 4 14:46:51 CEST 2007


How many strings are there?  Now you could use 'outer' and 'regexpr'
to determine which strings are subsets of another and then group them.
 So knowing the possible number of strings that you will be searching
with and how you might want a hierarchy printed out would help in
coming up with a solution.

On 10/4/07, Dieter Vanderelst <dieter_vanderelst at emailengine.org> wrote:
> Hi,
>
> A sub string can occur anywhere in the main string.
>
> I think I could use TABLE and than add the numbers. But I don't know how
> to access the numbers in the result of table.
>
> Another problem is that there might be a hierarchy in the strings. This
> is, string a might be a subset of b while b might be a subset of c. So,
> when checking the strings, I would have to start with the longest string
> and find all subsets of that one. An than I should check the second
> longest string and so on...
>
> But I cannot find a way of ordering strings on their length.
>
> Regards,
> Dieter
>
> jim holtman wrote:
> > How do you determine if one string is a subset of another?  Does it
> > only match at the beginning, or anywhere?  How large is your set of
> > strings?  Can you use table as you describe and then determine what
> > the groupings of subsets are and then just add the numbers together?
> > You can use grep/regexpr to determine if one string is a subset of
> > another.
> >
> > On 10/3/07, Dieter Vanderelst <dieter_vanderelst at emailengine.org> wrote:
> >> Hi list,
> >>
> >> I'm currently processing textual data and I would really appreciate some
> >> help with one off my problems.
> >>
> >> I have a set of strings and I want to count how often each of this
> >> strings appears in this set.
> >>
> >> This is not very difficult and can be done as:
> >>
> >> TB<-table(my_set)
> >> plot(TB)
> >>
> >> However, I also want to collapse across sub-strings. This is, I want a
> >> sub-string ss of string S to be counted as an occurrence of string S.
> >>
> >> So, 'abab' should be included in the count of 'ababaaa' and should not
> >> be listed as a separate entry in the frequency table.
> >>
> >> Does somebody has a pointer to a way to do this? I have been checking
> >> out the CRAN packages for handling DNA sequences, but this has not
> >> really brought me closer to a solution.
> >>
> >> Thanks,
> >> Dieter Vanderelst
> >>
> >> ------------------------------------------
> >> Dieter Vanderelst
> >> Eindhoven University of Technology
> >> Faculty of Industrial Design
> >> Designed Intelligence Group
> >> Den Dolech 2
> >> 5612 AZ Eindhoven
> >> The Netherlands
> >> Tel +31 40 247 91 11
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list