[R] import file formatted RFC-822
    Barry Rowlingson 
    b.rowlingson at lancaster.ac.uk
       
    Tue Apr 13 19:54:26 CEST 2010
    
    
  
On Tue, Apr 13, 2010 at 6:26 PM, Sebastian Kruk <residuo.solow at gmail.com> wrote:
> Dear R-list users:
>
> I would like to import a database of web robots,
> http://www.robotstxt.org/db/all.txt, it´s formatted RFC-822, ¿how can
> I do it?
 RFC822 looks very much like R's package DESCRIPTION files, and they
are read in using read.dcf because they are conformant to 'Debian
Control File' format. So I tried read.dcf on it:
 > robots = read.dcf("all.txt")
 > dim(robots)
 [1] 298  38
 so that's a matrix:
 > dimnames(robots)
[[1]]
NULL
[[2]]
 [1] "robot-id"                  "robot-name"
 [3] "robot-cover-url"           "robot-details-url"
 [5] "robot-owner-name"          "robot-owner-url"
 [7] "robot-owner-email"         "robot-status"
 [9] "robot-purpose"             "robot-type"
[11] "robot-platform"            "robot-availability"
[13] "robot-exclusion"           "robot-exclusion-useragent"
[15] "robot-noindex"             "robot-host"
[17] "robot-from"                "robot-useragent"
[19] "robot-language"            "robot-description"
[21] "robot-history"             "robot-environment"
[23] "modified-date"             "modified-by"
[25] "robot-nofollow"            "robot-owner-name2"
[27] "robot-owner-url2"          "robot-owner-email2"
[29] "robot-owner-name3"         "robot-owner-name4"
[31] "robot-environment1"        "robot-environment2"
[33] "robot-purpose1"            "robot-purpose2"
[35] "robot-purpose3"            "robot-platform1"
[37] "robot-description1"        "robot-description2"
 and I guess it pads out the columns so every row has every possible
variable value even if it doesn't exist in the record for that robot.
 Sorted?
Barry
    
    
More information about the R-help
mailing list