🦋 Character sets
At work, I've been involved in a project to support the full Unicode character set in a more-than-cursory way*, getting to understand wide characters and utf-8 much more fully than I ever did before; and finally I am thinking I want to encode READIN in utf-8. All this time it has been in ISO-8859-1, which works ok as long as I escape unsupported Unicode characters; but it seems like time to get with the program. My question is, what's the easiest way to convert my data? A lot of posts have got characters like äöüæ... which are going to show up as garbage if I just change the encoding of the blog. I was thinking I would use mysqldump and use iconv to convert the data. But somehow the output from mysqldump is already encoded with utf8. Does this mean I can just rebuild the database from this output and I'll be good to go? I'm a little confused why mysqldump is not respecting the encoding in the database...
Well, restoring from the output of mysql-dump does not have the desired effect; characters that were ISO-8859-1 in the original db, that were UTF-8 in the dump, are converted back to ISO-8859-1 in the restore. After further investigation, it seems like my original idea will work: although it looks to me like iconv is essentially double-encoding the characters that were transformed to utf-8 by mysqldump, when I load them back into mysql I get utf-8 characters. Not totally comfortable with this yet...
* (Previously our support for Unicode had consisted of walking through utf-8 strings looking for high-order characters we recognized, and flattening them to 7-bit ASCII.)
posted morning of Saturday, March 13th, 2010 ➳ More posts about The site ➳ More posts about Programming Projects ➳ More posts about Projects ➳ More posts about Programming
|