I recently wrote a conversion script and PHP wrapper so that the data from the Perl "last-chance transliterator"
Text::Unidecode by Sean M. Burke can be used from PHP:
unidecode_php-0.3.tar.gz. To use this you'll need to install the Perl Text::Unidecode module and then run the udec2bin.pl script inside the unidecode_php package.
Example PHP usage
(
Read more... )
Comments 2
There's a little bug though: The code initially converts into UCS-4BE (which is outdated and can be replaced by UTF-32) so every character becomes 4 bytes long. The function _unidecode_codepoint(), however, treats characters as 2 byte long.
Looking at the original Perl code, it becomes pretty clear, that the function is handling two byte unicode characters, that is UTF-16 (or UCS-2BE).
It therefore is suitable to convert input to UTF-16 instead of UCS-4BE.
Best regards,
Gerd
Reply
Reply
Leave a comment