[R] Ambiguities in vector

Mon Oct 8 15:35:01 CEST 2007

Hello James,

all of your suggestions work very well except of this:

FemMal <- cbind(FemV1gezählt[2,], MalV1gezählt[2,])

colnames(FemMal) <- ("Females", "Males")
Fehler: syntax error

FeMMal

   [,1]     [ ,2]
1  133   79
2  203  237
3   51   76

But it works if I do that:

Namen<-c("Female","Male")
colnames(FemMal) <- (Namen)

FemMal

   Female Male
1    133   79
2    203  237
3     51   76

Greetings

Birgit

Am 04.10.2007 um 17:19 schrieb James Reilly:

>
> Hi Birgit,
>
> First, can I suggest that you don't copy off-list conversations to  
> the mailing list partway through? Not that I minded in this case,  
> but it probably confuses people and the posting guide warns against  
> it.
>
> I'll address your questions in reverse order.
>
> To get tables for each column, try:
> apply(FemV1Test, 2, table)
>
> Likewise for males:
> apply(MalV1, 2, table)
>
> To compare them, perhaps put them side by side:
> FemMal <- cbind(apply(FemV1Test, 2, table)[2,], apply(MalV1, 2,  
> table)[2,])
> colnames(FemMal) <- ("Females", "Males")
> FemMal
>
> You can then do arithmetic, plot them, sort by the difference, etc.
> plot(FemMal)
> FemMal[order(FemMal[,1]-FemMal[,2]),]
>
> About crossprod, cell (i,j) in the resulting matrix shows the  
> number of cases with a 1 for attribute i  and attribute j. This  
> shows which attributes overlap most and least.
>
> The command "tab <- tab - diag(diag(tab))" puts zeroes down the  
> diagonal, as was requested. One cosmetic reason for doing this is  
> that the diagonal elements are often much larger than the off- 
> diagonal ones, and zeroing them makes the table easier to read or  
> display graphically. E.g.
> http://pbil.univ-lyon1.fr/ADE-4/ade4-html/table.dist.html	
>
> Yes, any row with all NAs will make the crossprod all NAs too. You  
> can ignore any rows with NAs as follows:
> CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3[apply 
> (CrossFemMalVar1_3, 1, function (x) !any(is.na(x))),]))
>
> I'm not sure if I follow why you want to know about statistical  
> significance here. Do you really think of the species in your study  
> as a sample from a larger population of plant species, which you  
> are trying to generalise about?
>
> If so, is the population much larger than your sample? And was your  
> sample of species selected randomly, i.e. with equal selection  
> probabilities? If not, standard tests probably won't apply.
>
> Regards,
> James
>
>
> On 2/10/07 2:44 AM, Birgit Lemcke wrote:
>> Hello James,
>> first I have to thank you for your help but there are some things  
>> I don´t understand now.
>> I am not sur if I understand what this example gives me back:
>> ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c 
>> (1,0,0,1), att3 = c(0,1,1,1))
>> ratings
>>     id att1 att2 att3
>> 1  1    1    1    0
>> 2  2    1    0    1
>> 3  3    0    0    1
>> 4  4    1    1    1
>> tab <- crossprod(as.matrix(ratings[,-1]))
>> tab <- tab - diag(diag(tab))
>> tab
>>        att1 att2 att3
>> att1    0    2    2
>> att2    2    0    1
>> att3    2    1    0
>> As I understood it gives me the number how often we find the same  
>> value for example comparing att1 and att2 for all id´s?. Is that  
>> right?
>> What is this line doing: tab <- tab - diag(diag(tab))
>> And what does the original output of crosspod mean:
>>       att1 att2 att3
>> att1    3    2    2
>> att2    2    2    1
>> att3    2    1    3
>> I tried to do this with a part of my dataset
>> I used a table with 3 variables (only binary)
>> In the first part of the table I have the females (348 rows) and  
>> in the second part the males (also 348 rows).
>> Then I tried this:
>> CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3))
>> The output:
>> CrossFemMal1_3
>>       V1 V2 V3
>> V1 NA NA NA
>> V2 NA NA NA
>> V3 NA NA NA
>> There was one row of NAs in my dataset. I presume this is  
>> responsible for the NA results? So how can I deal here with NAs?
>> If I use two matrices (male and female) I get back amongst others  
>> the comparison of att1male to att1 female. In the case that I use  
>> the possibility of a percentage table output I get for example  
>> 40%. Can I say then that if the percentage is lower than 50% the  
>> attributes are significantly different?
>> Corresponding to your other suggestion:
>> sapply(c("1","2","3"), function(x) ifelse(regexpr(x, FemV1) > 0,  
>> 1, 0))
>> It gives me this output
>>          1  2  3
>>   [1,]  1  0  0
>>   [2,]  1  0  0
>>   [3,]  1  0  0
>>   [4,]  1  0  0
>>   [5,]  1  0  0
>>   [6,]  1  0  0
>>   [7,]  1  0  0
>>   [8,]  1  0  0
>>   [9,]  0  1  0
>>      .     .   .   .
>>      .     .   .   .
>> I think now I should count the ones for 1, 2 and 3?
>> I tried to use table but it gives me only the counts for 1 and zero:
>> table(FemV1Test)
>> FemV1Test
>>   0   1
>> 657 387
>> How can I specify that it gives me the counts for every column?
>> And then do the same for MalV1 and compare both somehow?
>> Another time thanks in advance for your help.
>> Greetings Birgit
>> Am 29.09.2007 um 14:45 schrieb James Reilly:
>>>
>>> Hi Birgit,
>>>
>>> The first argument to regexpr should be just one character value,  
>>> not a vector. Your call:
>>> regexpr(c("1","2","3"),FemV1)
>>> seems to have been interpreted as:
>>> regexpr("1",FemV1)
>>>
>>> I think you probably need something more like:
>>> sapply(c("1","2","3"), function(x) ifelse(regexpr(x, FemV1) > 0,  
>>> 1, 0))
>>> This will also work on multiple response data such as
>>> FemV1 <- c("13", "2", "13", "123", "1", "23")
>>> Then colSums will give you frequency counts for each attribute.
>>>
>>> I think you would need greatly simplify the multiple response  
>>> data to apply anything like a paired t-test. Have you considered  
>>> just crosstabulating the attributes of male plants versus the  
>>> females? For some R code, see
>>> https://stat.ethz.ch/pipermail/r-help/2007-February/126125.html
>>>
>>> Regards,
>>> James
>>>
>>>
>>> On 29/9/07 3:37 AM, Birgit Lemcke wrote:
>>>> Hello James,
>>>> sorry that I have to ask you a second time but I don´t  
>>>> understand what regexpr () is doing and how the syntax works.
>>>> I have a vectors that I converted to character string
>>>> as.character(FalV1)
>>>>  [1] "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "2"
>>>> after that I did this, but without knowing what I am really doing
>>>> regexpr(c("1","2","3"),FemV1)
>>>> The output looked like that
>>>>  [1]  1  1  1  1  1  1  1  1 -1 As i undertsood the function  
>>>> looks for in this case 1, 2 or 3. If there is a match it gives  
>>>> me back 1 if not it gives me back -1
>>>> But I don´t know how this helps me now si I hope you will  
>>>> explain me.
>>>> And there is another problem I have. cor the continous variables  
>>>> I used a paired T-Test can I perform this approach also paired?
>>>> Thanks a lot in advance.
>>>> Greetings
>>>> Birgit
>>>> Am 21.09.2007 um 11:38 schrieb James Reilly:
>>>>>
>>>>> If I understand you right, you have several multiple response  
>>>>> variables (with the responses encoded in numeric strings) and  
>>>>> you want to see whether these are associated with sex. To  
>>>>> tabulate the data, I would convert your variables into  
>>>>> collections of dummy variables using regexpr(), then use table 
>>>>> (). You can use a modified chi-squared test with a Rao-Scott  
>>>>> correction on the resulting tables; see Thomas and Decady  
>>>>> (2004). Bootstrapping is another possible approach.
>>>>>
>>>>> @article{,
>>>>> Author = {Thomas, D. Roland and Decady, Yves J.},
>>>>> Journal = {International Journal of Testing},
>>>>> Number = {1},
>>>>> Pages = {43 - 59},
>>>>> Title = {Testing for Association Using Multiple Response Survey  
>>>>> Data: Approximate Procedures Based on the Rao-Scott Approach.},
>>>>> Volume = {4},
>>>>> Year = {2004},
>>>>> Url=http://search.ebscohost.com/login.aspx? 
>>>>> direct=true&db=pbh&AN=13663214&site=ehost-live <http:// 
>>>>> search.ebscohost.com/login.aspx? 
>>>>> direct=true&db=pbh&AN=13663214&site=ehost-live <http:// 
>>>>> search.ebscohost.com/login.aspx? 
>>>>> direct=true&db=pbh&AN=13663214&site=ehost-live>>
>>>>> }
>>>>>
>>>>> Hope this helps,
>>>>> James
>>>>> -- 
>>>>> James Reilly
>>>>> Department of Statistics, University of Auckland
>>>>> Private Bag 92019, Auckland, New Zealand
>>>>>
>>>>> On 21/9/07 7:14 AM, Birgit Lemcke wrote:
>>>>>> First thanks for your answer.
>>>>>> Now I try to explain better:
>>>>>> I have species in the rows and morphological attributes in  
>>>>>> the  columns coded by numbers (qualitative variables; nominal  
>>>>>> and ordinal).
>>>>>> In one table for the male plants of every species and in the  
>>>>>> other  table for the female plants of every species. The  
>>>>>> variables contain  every possible occurrence in this species  
>>>>>> and this gender.
>>>>>> I would like to compare every variable between male and female  
>>>>>> plants  for example using a ChiSquare Test.
>>>>>> The Null-hypothesis could be: Variable male is equal to  
>>>>>> variable Female.
>>>>>> The question behind all is, if male and female plants in this  
>>>>>> species  are significantly different and which attributes are  
>>>>>> responsible for  this difference.
>>>>>> I really hope that this is better understandable. If not  
>>>>>> please ask.
>>>>>> Thanks a million in advance.
>>>>>> Greetings
>>>>>> Birgit
>>>>>
>>>> Birgit Lemcke
>>>> Institut für Systematische Botanik
>>>> Zollikerstrasse 107
>>>> CH-8008 Zürich
>>>> Switzerland
>>>> Ph: +41 (0)44 634 8351
>>>> birgit.lemcke at systbot.uzh.ch <mailto:birgit.lemcke at systbot.uzh.ch>
>> Birgit Lemcke
>> Institut für Systematische Botanik
>> Zollikerstrasse 107
>> CH-8008 Zürich
>> Switzerland
>> Ph: +41 (0)44 634 8351
>> birgit.lemcke at systbot.uzh.ch <mailto:birgit.lemcke at systbot.uzh.ch>
>>
>
> -- 
> James Reilly
> Department of Statistics, University of Auckland
> Private Bag 92019, Auckland, New Zealand

Birgit Lemcke
Institut für Systematische Botanik
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
birgit.lemcke at systbot.uzh.ch