[R] Calculating proportions from a data frame rather than a table
Deepayan Sarkar
deepayan.sarkar at gmail.com
Thu Oct 4 00:22:06 CEST 2007
On 10/3/07, Farrel Buchinsky <fjbuch at gmail.com> wrote:
> Thank you. It comes close but not exactly what I wanted. I had to
> scrap my column that contained character values. That column noted the
> name of the study. Let me try show you here
>
> Best if viewed in courier font
>
> > coinfection
> study HPV6 HPV11 CoInfect other
> 1 Wiatrak 2004 31 23 4 0
> 2 Draganov 2006 6 14 3 0
> 3 Gabbott 1997 19 24 1 0
> 4 Gerein 2005 17 14 0 7
> 5 Michael 2005 8 5 0 1
> 6 Rabah 2001 29 32 0 0
> 7 Maloney 2006 4 4 7 0
>
> > str(coinfection)
> 'data.frame': 7 obs. of 5 variables:
> $ study : chr "Wiatrak 2004" "Draganov 2006" "Gabbott 1997"
> "Gerein 2005" ...
> $ HPV6 : num 31 6 19 17 8 29 4
> $ HPV11 : num 23 14 24 14 5 32 4
> $ CoInfect: num 4 3 1 0 0 0 7
> $ other : num 0 0 0 7 1 0 0
>
> I had tried the following and was getting nowhere
> > as.table(coinfection)
> Error in as.table.default(coinfection) : cannot coerce into a table
> > as.table(coinfection[,-1])
> Error in as.table.default(coinfection[, -1]) :
> cannot coerce into a table
>
> Thanks to you was able to make some progress.
>
> > as.table(as.matrix(coinfection))
> study HPV6 HPV11 CoInfect other
> 1 Wiatrak 2004 31 23 4 0
> 2 Draganov 2006 6 14 3 0
> 3 Gabbott 1997 19 24 1 0
> 4 Gerein 2005 17 14 0 7
> 5 Michael 2005 8 5 0 1
> 6 Rabah 2001 29 32 0 0
> 7 Maloney 2006 4 4 7 0
> SO FAR THIS LOOKS GOOD BUT THEN LOOK
>
>
> > prop.table(as.table(as.matrix(coinfection)),1)#the main reason for doing this
> Error in sum(..., na.rm = na.rm) : invalid 'type' (character) of argument
>
> > prop.table(as.table(as.matrix(coinfection[,-1])),1)#this is to get rid of the variable called "study"
> HPV6 HPV11 CoInfect other
> 1 0.53448276 0.39655172 0.06896552 0.00000000
> 2 0.26086957 0.60869565 0.13043478 0.00000000
> 3 0.43181818 0.54545455 0.02272727 0.00000000
> 4 0.44736842 0.36842105 0.00000000 0.18421053
> 5 0.57142857 0.35714286 0.00000000 0.07142857
> 6 0.47540984 0.52459016 0.00000000 0.00000000
> 7 0.26666667 0.26666667 0.46666667 0.00000000
>
> WORKS PERFECTLY, EXACTLY WHAT I WANTED EXCEPT I HAVE LOST THE NAME OF
> THE STUDY AND HAVE TO GO BACK TO LOOK AT WHICH DATA BELONGS TO WHICH
> STUDY. THIS WOULD NOT HAVE HAPPENED IF I HAD THE DATA IN ITS RAWEST
> FORM: A TWO COLUMN DATA FRAME WHERE COLUMN ONE WAS THE STUDY AND
> COLUMN 2 WAS A FACTOR (LEVELS BEING hpv 6, hpv 11, coinfection,
> other). SUCH A DATA FRAME WOULD HAVE HAD 253 rows. Then I could have
> used table(column1,column2) and I could have got all this data as a
> table and the study name would be preserved. It is not that big a deal
> that I have to look elsewhere to find the study name but it seems
> silly that I cannot analyze data that is not in the raw state. I am
> sure there is a way. I just do not know it.
Try making $study the row names (which they are for your `table'), end
everything should be fine:
row.names(coinfection) <- coinfection$study
coinfection$study <- NULL
prop.table(as.matrix(coinfection)) # etc
-Deepayan
More information about the R-help
mailing list