[R] RFC for package PopCon: a popularity contest for R and packages
Jeffrey Horner
jeff.horner at vanderbilt.edu
Thu Feb 14 15:57:47 CET 2008
(I posted this to the R-devel list yesterday, but I thought others on
this list would be interested, so sorry for those who get it twice.)
Hello all,
I've developed a prototype package called PopCon (short for popularity
contest), a package for tracking the popularity of R and its packages.
I'd like this work to be similar in spirit to the Debian package
popularity-contest: http://popcon.debian.org/.
Once Popcon is loaded, it captures two kinds of information from the
user and stores it into a cache: the names of the libraries he/she
loads, and the names of symbols requested from his/her code. Once the
cache is full, the goal is to flush the data to a central server for
storage, free for anyone to download and analyze. That's it. Pretty
simple use and works behind the scenes. You can get the prototype here:
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/JeffreyHorner/PopCon_0.1.tar.gz
And note that flushing of the cache is NOT TURNED ON and IT WON'T
FORWARD ANY DATA ANYWHERE! It only gets deleted.
So, I envision all the software and data generated and stored to be
licensed under a GPL and a Creative Commons license, or even public domain.
Thoughts? I'm looking for volunteers, because there are many issues to
hash out. Here's a few of them:
1. Obviously storing IP addresses or any bit of personal information is
out, but I'm interested in generating a permanent random key of some
sort so that data from the same R installs can be tracked. I'm wondering
if just md5 hashing the combination of R version, platform, and IP
address would be appropriate and reproducible per R install. The debian
package popularity-contest has the benefit of installing an '/etc'
config file and generating the key once, while I'd like PopCon users to
just call 'library(PopCon)' and do nothing else.
2. I'm willing to maintain the central server and work on the
infrastructure, but help will definitely be needed. Also, if there's
significant interested, maybe R core would be interested in this.
3. What exactly is PopCon tracking as far as symbol names go? It
currently used an R_ObjectTable object attached to the search path to
capture names, but is this the best way? see
http://www.omegahat.org/RObjectTables/. It's also replacing
base::getHook to trap library loads.
4. What else would be interesting to track? Some folks have suggested
various bits of R.Version() output.
Here's what PopCon can currently do:
> library(PopCon)
> search()
[1] ".GlobalEnv" "package:PopCon" ".pcUDB"
[4] "package:stats" "package:graphics" "package:grDevices"
[7] "package:utils" "package:datasets" "package:methods"
[10] "Autoloads" "package:base"
# Notice the above search entry .pcUDB. That's the R Object Table
> typeof(PopCon::getCache())
[1] "character"
> PopCon::getCache()
[1] ".conflicts.OK" "search" "::"
# Now the cache contains the name 'search', which I called above,
# and the double colon operator.
> library(cluster)
> any(PopCon::getCache()=='package:cluster')
[1] TRUE
# Package names are represented in the PopCon cache just like
# their name on the search path.
> PopCon::getCache()
[1] ".conflicts.OK" "search"
[3] "::" "$.data.frame"
[5] "$.default" "$.data.frame"
[7] "$.default" "unique.integer"
[9] "unique.numeric" "$.data.frame"
[11] "$.default" "unique.integer"
[13] "unique.numeric" "unique.character"
[15] "unique.integer" "unique.numeric"
[17] "close.gzfile" "$.packageDescription2"
[19] "$.default" "$.data.frame"
[21] "$.default" "unique.integer"
[23] "unique.numeric" "unique.character"
[25] "unique.integer" "unique.numeric"
[27] "close.gzfile" "$.packageDescription2"
[29] "$.default" "unique.integer"
[31] "unique.numeric" "close.gzfile"
[33] "names.simple.list" "names.default"
[35] "[.default" "as.character.simple.list"
[37] "as.vector.simple.list" "as.vector.default"
[39] "unique.character" "$.packageDescription2"
[41] "$.default" ">=.R_system_version"
[43] "Ops.R_system_version" ">=.package_version"
[45] "Ops.package_version" ">=.numeric_version"
[47] ">=.package_version" "Ops.package_version"
[49] ">=.numeric_version" "unlist.R_system_version"
[51] "unlist.package_version" "unlist.numeric_version"
[53] "unlist.default" "unlist.package_version"
[55] "unlist.numeric_version" "unlist.default"
[57] "as.list.R_system_version" "as.list.package_version"
[59] "unique.integer" "unique.numeric"
[61] "as.list.R_system_version" "as.list.package_version"
[63] "unique.integer" "unique.numeric"
[65] "as.list.package_version" "unique.integer"
[67] "unique.numeric" "as.list.package_version"
[69] "unique.integer" "unique.numeric"
[71] ">=.default" "$.packageDescription2"
[73] "$.default" "<.R_system_version"
[75] "Ops.R_system_version" "<.package_version"
[77] "Ops.package_version" "<.numeric_version"
[79] "unique.character" "unlist.R_system_version"
[81] "unlist.package_version" "unlist.numeric_version"
[83] "unlist.default" "unlist.numeric_version"
[85] "unlist.default" "as.list.R_system_version"
...
# I've truncated the output here.
But you get the idea. Any and all comments welcome.
Jeff
--
http://biostat.mc.vanderbilt.edu/JeffreyHorner
More information about the R-help
mailing list