[R] Newbie question regarding graphing of Princomp object
    Tobias Verbeke 
    tobias.verbeke at telenet.be
       
    Sat Jan 15 09:47:06 CET 2005
    
    
  
On Sat, 15 Jan 2005 05:39:00 +0100
List account <lists at norvelle.org> wrote:
> Greetings,
> 
> I am working on a stylometric analysis of some latin texts; one of the  
> latest stylometric techniques involves using principal components  
> analysis.  Not being a statistician, I can't really fully rely on PCA  
> as my primary tool, since I don't really understand the statistics  
> behind the PCA technique.  Nevertheless, the ability to use PCA and  
> graph the results has been marvelously helpful as a preliminary  
> technique to determine what kinds of stylometric variables are worth  
> pursuing as indicators of authorship.
> 
> For instance, I'm doing the following...  I have a set of data for  
> approximately 120 different latin works, about half of which are by St.  
> Thomas Aquinas, and the other half are by various other authors in the  
> Thomistic tradition, some known and some anonymous.  My data for  
> frequencies of prepositions looks like the following:
> 
> A,AD,CIRCA,CUM,DE, .... (total of 10 variables)
> 1,0.00967667222531036,0.0208124884194923,0.00142671854734112,0.004863813 
> 22957198,0.00758291643505651 ...
> 2,0.00874917700292081,0.0217315416668508,0.00133005165549453,0.004379007 
> 27772451,0.00537323193714733 ....
> 3,0.0064258378627327,0.0280901956627422,0.00178739176045295,0.0043058230 
> 9573329,0.00821688482105979 ....
> 4,0.00706850368364528,0.027446604903448,0.000821141574836712,0.004617615 
> 47172807,0.00812783899774761 ....
> 5,0.010214039424891,0.015409971157808,0.000745993537614122,0.00584650749 
> 246416,0.00475787738815518 ....
> 6,0.00952534711010655,0.0180981595092025,0.00125928317726832,0.005150145 
> 30190507,0.00447206974491443 ...
> .... (and so on for the rest of the 120 works)
> 
> The works are numbered such that works 100 and below are by St. Thomas,  
> those from 101 to 117 are of dubious authenticity, and those from 118  
> to 179 are by other authors.
> 
> When I perform a biplot, on the results of the princomp() function, I  
> get a nice graph that plots the 120 works on the two principal  
> component axes (I've figured out how to get rid of the red arrows  
> already).  Given that the data points tend to jumble together, I'd like  
> some way to color the different categories of works in the biplot, so  
> that data points for works 1-100 are red, those from 101-117 are blue,  
> and those from 118 to 179 are green (for instance).
You can use the `col' argument in the biplot call. In this case, I
would do something like 
biplot(mydata, col = c(rep("red", 100), rep("blue", 17), rep("green", 62)))
For a list of built-in color names, you can type colors() at the R prompt.
For more information on biplot, type ?biplot
VaRiis modis bene fit.
HTH,
Tobias
> I've included a sample of the output that I'm currently getting, in  
> case it's helpful to anybody.  BTW, I am running RAqua (for the Mac),  
> version 1.8.1.
> 
> Thanks in advance for any help!
> 
> -Erik Norvelle
> erik (at) norvelle (dot) org
> Facultad de Filosofía y Letras
> Universidad de Navarra
> Pamplona, Navarra, España
> 
>
    
    
More information about the R-help
mailing list