[R] Data frame manipulation - newbie question

jim holtman jholtman at gmail.com
Mon Jan 7 02:41:18 CET 2008


There are a number of different ways that you would have to manipulate
your data to do what you want.  It is useful to learn some of these
techniques.  Here, I think, are the set of actions that you want to
do.

> x <- read.table(textConnection("row  k.idx      step.forwd   pt.num    model   prev   value    abs.error
+ 1      200        0                  1             lm          09
 10.5       1.5
+ 2      200        0                  2             lm          11
10.5       1.5
+ 3      201        1                  1             lm          10
12          2.0
+ 4      201        1                  2             lm          12
12          2.0
+ 5      202        2                  1             lm          12
12.1       0.1
+ 6      202        2                  2             lm          12
12.1       0.1
+ 7      200        0                  1             rlm         10.1
10.5       0.4
+ 8      200        0                  2             rlm         10.3
10.5       0.2
+ 9      201        1                  1             rlm         11.6
12          0.4
+ 10    201        1                  2             rlm         11.4
12          0.6
+ 11    202        2                  1             rlm         11.8
12.1       0.1
+ 12    202        2                  2             rlm         11.9
12.1       0.2"), header=TRUE)
> closeAllConnections()
>
> # split the data by the grouping factors
> x.split <- split(x, list(x$k.idx, x$step.forwd, x$model), drop=TRUE)
> x.split
$`200.0.lm`
  row k.idx step.forwd pt.num model prev value abs.error
1   1   200          0      1    lm    9  10.5       1.5
2   2   200          0      2    lm   11  10.5       1.5

$`201.1.lm`
  row k.idx step.forwd pt.num model prev value abs.error
3   3   201          1      1    lm   10    12         2
4   4   201          1      2    lm   12    12         2

$`202.2.lm`
  row k.idx step.forwd pt.num model prev value abs.error
5   5   202          2      1    lm   12  12.1       0.1
6   6   202          2      2    lm   12  12.1       0.1

$`200.0.rlm`
  row k.idx step.forwd pt.num model prev value abs.error
7   7   200          0      1   rlm 10.1  10.5       0.4
8   8   200          0      2   rlm 10.3  10.5       0.2

$`201.1.rlm`
   row k.idx step.forwd pt.num model prev value abs.error
9    9   201          1      1   rlm 11.6    12       0.4
10  10   201          1      2   rlm 11.4    12       0.6

$`202.2.rlm`
   row k.idx step.forwd pt.num model prev value abs.error
11  11   202          2      1   rlm 11.8  12.1       0.1
12  12   202          2      2   rlm 11.9  12.1       0.2

>
> # now take the means of given columns
> x.mean <- lapply(x.split, function(.grp) colMeans(.grp[, c('prev', 'value', 'abs.error')]))
>
> # put back into a matrix
> (x.mean <- do.call(rbind, x.mean))
           prev value abs.error
200.0.lm  10.00  10.5      1.50
201.1.lm  11.00  12.0      2.00
202.2.lm  12.00  12.1      0.10
200.0.rlm 10.20  10.5      0.30
201.1.rlm 11.50  12.0      0.50
202.2.rlm 11.85  12.1      0.15
>
> #boxplot
> boxplot(abs.error ~ k.idx, data=x)
>
> # create a table with average of the abs.error for each 'model'
> cbind(x, abs.error.mean=ave(x$abs.error, x$model))
   row k.idx step.forwd pt.num model prev value abs.error abs.error.mean
1    1   200          0      1    lm  9.0  10.5       1.5      1.2000000
2    2   200          0      2    lm 11.0  10.5       1.5      1.2000000
3    3   201          1      1    lm 10.0  12.0       2.0      1.2000000
4    4   201          1      2    lm 12.0  12.0       2.0      1.2000000
5    5   202          2      1    lm 12.0  12.1       0.1      1.2000000
6    6   202          2      2    lm 12.0  12.1       0.1      1.2000000
7    7   200          0      1   rlm 10.1  10.5       0.4      0.3166667
8    8   200          0      2   rlm 10.3  10.5       0.2      0.3166667
9    9   201          1      1   rlm 11.6  12.0       0.4      0.3166667
10  10   201          1      2   rlm 11.4  12.0       0.6      0.3166667
11  11   202          2      1   rlm 11.8  12.1       0.1      0.3166667
12  12   202          2      2   rlm 11.9  12.1       0.2      0.3166667
>


On Jan 6, 2008 10:50 AM, Rense Nieuwenhuis <rense.nieuwenhuis at gmail.com> wrote:
> Hi,
>
> you may want to use that apply / tapply function. Some find it a bit
> hard to grasp at first, but it will help you many times in many
> situations when you get the hang of it.
>
> Maybe you can get some information on my site: http://
> www.rensenieuwenhuis.nl/r-project/manual/basics/tables/
>
>
> Hope this helps,
>
> Rense Nieuwenhuis
>
>
>
> On Jan 3, 2008, at 11:53 , José Augusto M. de Andrade Junior wrote:
>
> > Hi all,
> >
> > Could someone please explain how can i efficientily query a data frame
> > with several factors, as shown below:
> >
> > ----------------------------------------------------------------------
> > -----------------------------------
> > Data frame: pt.knn
> > ----------------------------------------------------------------------
> > -----------------------------------
> > row | k.idx   |   step.forwd  |  pt.num |   model |   prev  |  value
> > |  abs.error
> > 1      200        0                  1             lm          09
> > 10.5       1.5
> > 2      200        0                  2             lm          11
> > 10.5       1.5
> > 3      201        1                  1             lm          10
> > 12          2.0
> > 4      201        1                  2             lm          12
> > 12          2.0
> > 5      202        2                  1             lm          12
> > 12.1       0.1
> > 6      202        2                  2             lm          12
> > 12.1       0.1
> > 7      200        0                  1             rlm         10.1
> > 10.5       0.4
> > 8      200        0                  2             rlm         10.3
> > 10.5       0.2
> > 9      201        1                  1             rlm         11.6
> > 12          0.4
> > 10    201        1                  2             rlm         11.4
> > 12          0.6
> > 11    202        2                  1             rlm         11.8
> > 12.1       0.1
> > 12    202        2                  2             rlm         11.9
> > 12.1       0.2
> > ----------------------------------------------------------------------
> > ------------------------------------
> >
> > k.idx, step.forwd, pt.num and model columns are FACTORS.
> > prev, value, abs.error are numeric
> >
> > I need to take the mean value of the numeric columns  (prev, value and
> > abs.error) for each k.idx and step.forwd and model. So: rows 1 and 2,
> > 3 and 4, 5 and 6,7 and 8, 9 and 10, 11 and 12 must be grouped
> > together.
> >
> > Next, i need to plot a boxplot of the mean(abs.error) of each model
> > for each k.idx.
> > I need to compare the abs.error of the two models for each step and
> > the mean overall abs.error of each model. And so on.
> >
> > I read the manuals, but the examples there are too simple. I know how
> > to do this manipulation in a "brute force" manner, but i wish to learn
> > how to work the right way with R.
> >
> > Could someone help me?
> > Thanks in advance.
> >
> > José Augusto
> > Undergraduate student
> > University of São Paulo
> > Business Administration Faculty
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?




More information about the R-help mailing list