READIN: Character sets

(This is a page from my archives)

➻

More posts about:

➻	The site
➻	Programming Projects
➻	Projects
➻	Programming

Archives index
Subscribe to RSS

This page renders best in Firefox (or Safari, or Chrome)

READIN started out as a place for me to keep track of what I am reading, and to learn (slowly, slowly) how to design a web site.

There has been some mission drift here and there, but in general that's still what it is. Some of the main things I write about here are reading books, listening to (and playing) music, and watching the movies. Also I write about the work I do with my hands and with my head; and of course about bringing up Sylvia.

The site is a bit of a work in progress. New features will come on-line now and then; and you will occasionally get error messages in place of the blog, for the forseeable future. Cut me some slack, I'm just doing it for fun! And if you see an error message you think I should know about, please drop me a line. READIN source code is PHP and CSS, and available on request, in case you want to see how it works.

See my reading list for what I'm interested in this year.

READIN has been visited approximately 236,737 times since October, 2007.

🦋 Character sets

At work, I've been involved in a project to support the full Unicode character set in a more-than-cursory way*, getting to understand wide characters and utf-8 much more fully than I ever did before; and finally I am thinking I want to encode READIN in utf-8. All this time it has been in ISO-8859-1, which works ok as long as I escape unsupported Unicode characters; but it seems like time to get with the program.

My question is, what's the easiest way to convert my data? A lot of posts have got characters like Ã¤Ã¶Ã¼Ã¦... which are going to show up as garbage if I just change the encoding of the blog. I was thinking I would use mysqldump and use iconv to convert the data. But somehow the output from mysqldump is already encoded with utf8. Does this mean I can just rebuild the database from this output and I'll be good to go? I'm a little confused why mysqldump is not respecting the encoding in the database...

Well, restoring from the output of mysql-dump does not have the desired effect; characters that were ISO-8859-1 in the original db, that were UTF-8 in the dump, are converted back to ISO-8859-1 in the restore.

After further investigation, it seems like my original idea will work: although it looks to me like iconv is essentially double-encoding the characters that were transformed to utf-8 by mysqldump, when I load them back into mysql I get utf-8 characters. Not totally comfortable with this yet...

* (Previously our support for Unicode had consisted of walking through utf-8 strings looking for high-order characters we recognized, and flattening them to 7-bit ASCII.)

posted morning of Saturday, March 13th, 2010
➳ More posts about The site
➳ More posts about Programming Projects
➳ More posts about Projects
➳ More posts about Programming

Respond:

Name:

E-mail:
(will not be displayed)

Link:

Remember info

Drop me a line! or, sign my Guestbook.
•
Check out Ellen's writing at Patch.com.

What's of interest:

➪	At Conversational Reading, Scott Esposito links to a wonderful reading of "TlÃ¶n, Uqbar, Orbis Tertius," and mentions what sounds like a very interesting manifesto.

(Other links of interest at my Google+ page. It's recommend ed!)

`READIN`

Jeremy's journal

`READIN`

🦋 Character sets

Respond:

What's of interest:

Where to go from here...