[Rd] best way to extract this meaningful data from a table
bryan rasmussen
rasmussen.bryan at gmail.com
Tue Feb 19 01:19:19 CET 2013
I have a table with a structure like the following:
lang | basic id | doc id | topics|
se | 447157 | MD_2002_0014 |12 |
loaded topics <- read.table("path to file",header=TRUE, sep="|",
fileEncoding="utf-8")
In that table the actual meaningful data (in this context) is the text
before the first underscore in doc id which is the document type ( for
example MD as above), and topics.
However topics can have more than one value in it, multiple values are
comma separated, if there is no actual topic I have a 0 although I can
also have an empty column if I want.
So what I want is the best way to extract the meaningful data - the
comma separated values of each topics column and the actual document
type so that I can start to do reports of how many documents of type X
have no topics, median number of topics per document type etc.
Do I have to loop through the table and build a new table up with the
info I want, or is there a smarter way to do it?
If a smarter way, what is that smarter way.
Thanks,
Bryan Rasmussen
More information about the R-devel
mailing list