The READIN Family Album
Me and Sylvia, on the Potomac (September 2010)

READIN

Jeremy's journal

What word will be spoken that will give meaning to all this?

José Saramago


(This is a subset of my posts)
Front page
More posts about Projects

Archives index
Subscribe to RSS
Follow on Facebook
Follow video posts

This page renders best in Firefox (or Safari, or Chrome)

Saturday, March 13th, 2010

Character sets

At work, I've been involved in a project to support the full Unicode character set in a more-than-cursory way*, getting to understand wide characters and utf-8 much more fully than I ever did before; and finally I am thinking I want to encode READIN in utf-8. All this time it has been in ISO-8859-1, which works ok as long as I escape unsupported Unicode characters; but it seems like time to get with the program.

My question is, what's the easiest way to convert my data? A lot of posts have got characters like äöüæ... which are going to show up as garbage if I just change the encoding of the blog. I was thinking I would use mysqldump and use iconv to convert the data. But somehow the output from mysqldump is already encoded with utf8. Does this mean I can just rebuild the database from this output and I'll be good to go? I'm a little confused why mysqldump is not respecting the encoding in the database...

Well, restoring from the output of mysql-dump does not have the desired effect; characters that were ISO-8859-1 in the original db, that were UTF-8 in the dump, are converted back to ISO-8859-1 in the restore.

After further investigation, it seems like my original idea will work: although it looks to me like iconv is essentially double-encoding the characters that were transformed to utf-8 by mysqldump, when I load them back into mysql I get utf-8 characters. Not totally comfortable with this yet...

* (Previously our support for Unicode had consisted of walking through utf-8 strings looking for high-order characters we recognized, and flattening them to 7-bit ASCII.)

posted morning of March 13th, 2010: Respond
➳ More posts about The site

Thursday, August 20th, 2009

errno

Perhaps you are a programmer; perhaps you use gdb to step through the programs you have written, looking for bugs; perhaps you have wondered why gdb will not let you examine the contents of the errno variable. Here's the deal: if you are typing print errno and getting the message Cannot access memory at address 0x8, it is because errno is not an actual variable; the compiler has replaced references to errno with *__errno_location() --

print *__errno_location()
will give you what you're looking for.

posted morning of August 20th, 2009: 5 responses
➳ More posts about Programming Projects

Friday, April 24th, 2009

(Relatively) Close to the metal

Neat-o, I found a new tool for testing stuff out! It is called netcat and it is essentially what I've always wished telnet would be. You can basically open up a socket and listen as the server or attach as a client, and see all the traffic from the other party, and type in the traffic from your side. Be sure to read the (brief) man page as the tutorial pages I've found on the web this morning all omit important information and leave you scratching your head as to what is going wrong.

The two things (at first glance) that nc has over telnet: primary point is that you can listen on a port, and thus emulate a server; telnet does not do this at all. Secondary advantage, the whole thing is much cleaner and simpler, and easier to run as a batch job; you don't have to learn escape characters or anything like that. Drawback is that error reporting is pretty minimal; but I can live with that.

posted morning of April 24th, 2009: Respond
➳ More posts about Projects

Saturday, January 10th, 2009

Dream Blogging

An anxiety dream last night about the code I am maintaining: a customer reported a bug, which on analysis proved to be a corruption in shared memory caused by the customer's input. When a year was specified in the date parameter, the corruption would occur -- if only month and day were specified, everything was ok. It was pretty mysterious how this would not have been discovered years ago as the section of code governing this parameter had not been modified in a long time; but the bug was pretty easy to track down and fix. Except, a little later I was talking to my boss and he thought I had told him the bug was not fixable and we would have to find a workaround. I was going crazy trying to figure out how to respond to this because I had forgotten where I had put the fix in and I had not commented it or kept any records...

Later in the night, I dreamed I was examining and modifying the source code for my dreams. It was not clear how or whether I would be able to recompile and distribute the fixes to my consciousness.

posted morning of January 10th, 2009: Respond
➳ More posts about Dreams

Tuesday, November 25th, 2008

O Excellent New Tool!

You know what debugging tool I just hate having to deal with? Purify, is what. Its interface seems insanely cumbersome to me, it's hard to use in conjunction with gdb, I dislike having to compile a separate version for heap-checking. Well today, my co-worker Nick hipped me to valgrind, which just seems like it was made for me. Exactly suited to my style of debugging. Basically it just spits out a ton of messages to stderr, interspersed with your own stderr output you can troubleshoot very quickly and come up with a bug location to reproduce in gdb.

My goal is to become a power user of valgrind -- starting with no knowledge of the product I was quickly able to isolate the problem I was seeing. If I acquaint myself with it's features it's going to make a really valuable addition to the toolchest.

posted evening of November 25th, 2008: Respond

Wednesday, September 10th, 2008

Indexing and Retrieval

So some people at my company are using Lucene as part of a document retrieval system they're building -- I have interacted with them some and had formed the impression that it was sort of a database product, vaguely like MySql but with more fully featured searching. But now I'm learning a little more about it and am very impressed -- it makes searching totally independent of data storage. This seems like a fantastic idea, I'm really looking forward to learning more about it.

posted evening of September 10th, 2008: Respond

Friday, May 9th, 2008

Timeouts

Question about timeouts on select(): if anyone has ideas about this, please let me know in comments.

Obviously select() is not a real-time operation; if you pass in a 1-second timeout, you cannot assume that you will get to run again in one second, since the operating system is allotting time to all the processes on the machine: in an extremely busy environment, it could be several seconds before you get the processor back. But I'm wondering whether the timeout is 1 second of real time, or 1 second of execution time -- in the very busy environment where your process does not get another time slice for more than a second, would select() continue to wait on the files you passed in until it had waited for a second? Or would it return immediately?

(select() as it is used in this post should be read to mean "select() and poll()," since I'm assuming both API's behave the same in this regard. Who knows, maybe they don't! But that seems unlikely to me.)

posted afternoon of May 9th, 2008: 2 responses

Wednesday, May 7th, 2008

Dream blogging

So last night I was maintaining code for a program which loaded a helper program for handling data files. Before it executed the helper program it would check the sum of the binary, I think because certain instances of the helper needed special handling; if the sum was not on a list of recognized values, the program would log an error and exit. Unfortunately the helper program was not stable and was being recompiled frequently; every time this happened I needed to edit the list of recognized sums, which was hard-coded into the main program, and recompile the main program. I was embarrassed about such a stupid bit of code being in the program so I was editing, compiling, and distributing the main program without mentioning it to anybody. What a stressful dream that was!

(Sort of ties everything together in a way, that I woke up humming Bessie Smith's "Gimme Pigfoot", which was in Gertrude Sturdley's post this week and which I was working on a fiddle version of last night.)

posted morning of May 7th, 2008: Respond

Tuesday, March 18th, 2008

stderr and cgi

Handy to know: if you write to stderr from a CGI application (note: when I say "CGI application", I don't actually know what that means -- what I mean is, an application invoked by the http server in response to a GET or POST request and which uses environment variables to get information about the request), the output will go into the http server's log file. At least it will if the http server is Apache.

posted afternoon of March 18th, 2008: Respond

Friday, February 15th, 2008

Multiplexing

The fix is in -- the server is using poll instead of select, a new version has been built and delivered to the client, it can handle loads of clients. Here is the long and short of how you do it (without error-checking, which is dull*):

The Old Code

void select_files(int *fds, int nfds)
{
    int i, maxid;
    fd_set rset, wset;
    timeval tval;

    FD_ZERO (&rset);
    FD_ZERO (&wset);
    maxid = 0;
    for (i = 0; i < nfds; ++i) {
        FD_SET (fds[i], &rset);
        FD_SET (fds[i], &wset);
        if (fds[i] > maxid) maxid = fds[i];
    }
    tval.tv_sec = 5;
    tval.tv_usec = 0;

    select (maxid + 1, &rset, &wset, NULL, &tval);
    for (i = 0; i < nfds; ++i) {
        if (FD_ISSET(fds[i], &rset)) 
            read_file(fds[i]);
        if (FD_ISSET(fds[i], &wset)) 
            write_file(fds[i]);
    }
}

The New Code

void poll_files(int *fds, int nfds)
{
    int i;
    pollfd *pfds = (pollfd *) 
              malloc (nfds * sizeof (pollfd));

    for (i = 0; i < nfds; ++i) {
        pfds[i].fd = fds[i];
        pfds[i].events = POLLIN | POLLOUT;
        pfds[i].revents = 0;
    }

    poll (pfds, nfds, 5000);
    for (i = 0; i < nfds; ++i) {
        if (pfds[i].revents & POLLIN) 
            read_file(fds[i]);
        if (pfds[i].revents & POLLOUT) 
            write_file(fds[i]);
    }
}

In order to take advantage of the newly accessible file descriptors above 1024, you will need to add these lines to your /etc/security/limits.conf file:

(username)          soft    nofile          1024
(username)          hard    nofile          4096

I chose 1024 for the soft limit since most apps are not interested in the high number of files, and 4096 for the hard limit because I read on some message boards that performance will degrade above that number. Feel free to choose other values.

You then need to make the following calls from your code (or call ulimit from the script that starts your application):

    struct rlimit nofile;

    if (getrlimit (RLIMIT_NOFILE, &nofile) != 0) {
        fprintf (stderr, "Could not get NOFILE");
        exit (1);
    }
    nofile.rlim_cur = 4096;
    if (setrlimit (RLIMIT_NOFILE, &nofile) != 0) {
        fprintf (stderr, "Could not set NOFILE");
        exit (1);
    }

*If you're interested in the error-checking code, drop me a line -- I just don't feel like typing it out right now.

posted afternoon of February 15th, 2008: Respond

Previous posts about Programming
Archives

Drop me a line! or, sign my Guestbook.
    •
Check out Ellen's writing at Patch.com.

What do you think?

Where to go from here...

Comix
Blogs
Music
Texts
Woodworking
Programming
South Orange
Friends and Family
readincategory