[R] Reading .csv file under linux

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Jan 23 01:10:38 CET 2008


David Scott wrote:
> I am a total dunce when it comes to encodings though. How do you find the 
> encoding of a file?
>   
You don't. Either you know it, or you are up the proverbial creek (or 
roof). The "8-bit ascii" encodings is one of the greater computer crimes 
of the last century precisely because the files contain no clue about 
which encoding they are in.

Well, not quite true. Those of us with non-ascii letters in their 
language will know to look for certain tell-tale bytes or byte sequences 
(e.g. \xe6 is Danish character 'æ' in latin-1 whereas \xc3 is A-tilde 
but more likely to be the UTF-8 multibyte escape char), IF we have an 
idea about the language the file is written in.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list