[R] R usage for log analysis
    Allen S. Rout 
    asr at ufl.edu
       
    Mon Jun 12 06:44:51 CEST 2006
    
    
  
"Gabriel Diaz" <gabidiaz at gmail.com> writes:
> and what is the correct path to do it?
> 
> I mean, put logs files in a mysql or somehting like that, and then
> make R use that data, using the data from the files directly?
I haven't stuck anything in a DB yet.  I'm not sure how much of the DB
clue is used under the covers. 
> pre-parse the log files to accomodate them to R?
 
Probably not; a little familiarity with the reading functions will
obviate most needs to pre-parse.
> I need faqs, manuals, books, whatever to learn about this, can anyone
> give some advice?
[...]
Don't expect a warm welcome.  This community is like all open-source
communities, sharply focused on its' own concerns and expertise.  And,
in an unusual experience for computer types, our core competencies
hold little or no sway here; they don't even give us much of a leg up.
Just wait 'till you want to do something nutso like produce a business
graphic. :)
I'm working on understanding enough of R packaging and documentation
to begin a 'task view' focused on systems administration, for humble
submission. That might end up being mostly "log analysis"; the term
can describe much of what we do, if it's stretched a bit.  I'm hoping
the task view will attract the teeming masses of sysadmins trapped in
the mire of Gnuplot and friends.
For starters, become familliar with read.table(); with a few
variations it will take care of all the 
while (<>) { @blah = split(/,/); etc. etc. etc. } 
you've been accustomed to doing.  
Name columns;  this makes it easier to think about your data.  
names(my_data)<-c("column","names","can","be","assigned","to")
Start thinking of your data in generic sets, as opposed to specific
rows.  Situations which required iteration over specific rows in
PERL-land fall neatly to precise assignment in R.  For example, if
you've got records with dates and times and you want to work with time
structures:
in PERL you'd 
foreach (...) 
{$foo->{pdate} = parsedate($foo->{date}." ".$foo->{time})}
or some such.  In R-land, the iteration is implicit.  Here's a snippet
from something I'm using 
a$pdate<-as.POSIXct(paste(format(a$dte,"%Y/%m/%d"),a$time)) 
You're really acting on logical columns all at once here.  This is
fantastically more efficient in terms of your thought processes.  
- Allen S. Rout
    
    
More information about the R-help
mailing list