[Rd] [PATCH] Improve utf8clen and remove utf8_table4

Sahil Kang sahil.kang at asilaycomputing.com
Sun Mar 19 07:31:57 CET 2017


Given a char `c' which should be the start byte of a utf8 character,
the utf8clen function returns the byte length of the utf8 character.

Before this patch, the utf8clen function would return either:
     * 1 if `c' was an ascii character or a utf8 continuation byte
     * An int in the range [2, 6] indicating the byte length of the utf8 
character

With this patch, the utf8clen function will now return either:
     * -1 if `c' is not a valid utf8 start byte
     * The byte length of the utf8 character (the number of leading 1's, 
really)

I believe returning -1 for continuation bytes makes utf8clen less error 
prone.
The utf8_table4 array is no longer needed and has been removed.

Sahil
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-patch
Size: 1709 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170318/e8e82a14/attachment.bin>


More information about the R-devel mailing list