[R] pdf() device uses fonts to represent points - data alteration?

Wed Oct 31 12:39:37 CET 2007

Thank you very much for your answer, even so long after I first  
posted the message.

On 2007-October-31  , at 12:00 , Paul Murrell wrote:
> Hi
>
> jiho wrote:
>> Hello all,
>> I discovered that the pdf device uses fonts to represent "points"   
>> symbols (as in plot(...,type="p",...) ). Namely it uses  
>> ZapfDingbats  with symbol U+25cf. This can lead to problems when  
>> the font is not  available, or available in another version (such  
>> as points being  replaced by other symbols, or worst: slightly  
>> displaced).  Furthermore, it also causes problems when opening the  
>> pdf files for  editing in other programs. I know that for  
>> reproducibility one should  avoid doing this but there are cases  
>> where R is simply not suited to  produce the end result graphic  
>> directly using code (Ex: replace some  colors by CMYK versions for  
>> color consistency in print). In addition,  publishers also often  
>> like being able to retouch graphics to ensure  fonts consistency  
>> or such, and this will be destructive in the case  of these pdfs.  
>> For example, Inkscape interprets points as squares  (more like U 
>> +2751 in ZapfDingbats) and Adobe Illustrator does not  even  
>> recognize the font (substituting AdobePiStd).
>> I tried to embed fonts with embedFonts() but his does not solves  
>> the  issue with editing (Inkscape produces a kind of star and AI  
>> still  chokes on the font) and worst, it modifies how the original  
>> graphic  renders in pdf viewers: the circles are now filled (I  
>> believe this is  because this is the default state of the  
>> ZapfDingbats character).
>> So my questions are:
>> - does anyone have a work around this?
>> - why can't the pdf device use shapes instead of fonts to  
>> represent  data point? It would appear as a much more robust  
>> approach and would  ensure that the points are rendered the same  
>> everywhere. Font  substitution in axes labels is not as bad since  
>> it does not modify  the data itself (at worst the labels are  
>> offset a little bit) but  font substitution on the data points can  
>> really harm the graphic.
>
> If I recall correctly, the PDF device uses a character for small  
> circles because that looks better.  There is no PDF circle  
> primitive, so circles have to be drawn using bezier curves.  The  
> original author may be able to elaborate on that.

OK. I was suspecting that PDF did not have circle primitives indeed.  
That's a good reason.

> Two suggestions for workarounds:
> (i)  produce PostScript and then convert to PDF using something  
> like ghostscript (e.g., ps2pdf)
> (ii)  use an almost-but-not-quite opaque colour, e.g., rgb(0, 0,  
> 0, .99) for the points.  If the points are not fully opaque, the  
> character is not used.

(ii) is really good to know (and I would probably never have found it  
myself). (i) is not applicable since I use PDF to keep transparency.

Thanks for your help. I still think that not using fonts at all  
should be preferred because really strange things can happen with  
fonts while bezier curves are robust and do not depend at all on the  
rest of the OS. In this precise matter, robustness is probably to be  
preferred over appearance, since it involves the data directly.  
Anyway, I'm fine now with your workaround. I should file a bug report  
for this to be solved in a future release maybe.

JiHO
---
http://jo.irisson.free.fr/