The READIN Family Album
Adamastor, by Júlio Vaz Júnior

READIN

Jeremy's journal

Be quiet the doctor's wife said gently, let's all keep quiet, there are times when words serve no purpose, if only I, too, could weep, say everything with tears, not have to speak in order to be understood.

José Saramago


(This is a page from my archives)
Front page
More recent posts
Older posts
More posts about:
The site
Programming Projects
Projects
Programming

Archives index
Subscribe to RSS

This page renders best in Firefox (or Safari, or Chrome)

🦋 Character sets

At work, I've been involved in a project to support the full Unicode character set in a more-than-cursory way*, getting to understand wide characters and utf-8 much more fully than I ever did before; and finally I am thinking I want to encode READIN in utf-8. All this time it has been in ISO-8859-1, which works ok as long as I escape unsupported Unicode characters; but it seems like time to get with the program.

My question is, what's the easiest way to convert my data? A lot of posts have got characters like äöüæ... which are going to show up as garbage if I just change the encoding of the blog. I was thinking I would use mysqldump and use iconv to convert the data. But somehow the output from mysqldump is already encoded with utf8. Does this mean I can just rebuild the database from this output and I'll be good to go? I'm a little confused why mysqldump is not respecting the encoding in the database...

Well, restoring from the output of mysql-dump does not have the desired effect; characters that were ISO-8859-1 in the original db, that were UTF-8 in the dump, are converted back to ISO-8859-1 in the restore.

After further investigation, it seems like my original idea will work: although it looks to me like iconv is essentially double-encoding the characters that were transformed to utf-8 by mysqldump, when I load them back into mysql I get utf-8 characters. Not totally comfortable with this yet...

* (Previously our support for Unicode had consisted of walking through utf-8 strings looking for high-order characters we recognized, and flattening them to 7-bit ASCII.)

posted morning of Saturday, March 13th, 2010
➳ More posts about The site
➳ More posts about Programming Projects
➳ More posts about Projects
➳ More posts about Programming

Respond:

Name:
E-mail:
(will not be displayed)
Link:
Remember info

Drop me a line! or, sign my Guestbook.
    •
Check out Ellen's writing at Patch.com.

What's of interest:

(Other links of interest at my Google+ page. It's recommended!)

Where to go from here...

Friends and Family
Programming
Texts
Music
Woodworking
Comix
Blogs
South Orange
readinsinglepost