READIN: Programming (as of October 12, 2007)

(This is a page from my archives)

➻	Front page
➻	Most recent posts about Programming
➻	More posts about Projects

Archives index
Subscribe to RSS

This page renders best in Firefox (or Safari, or Chrome)

Programming

Posts about Programming

READIN started out as a place for me to keep track of what I am reading, and to learn (slowly, slowly) how to design a web site.

There has been some mission drift here and there, but in general that's still what it is. Some of the main things I write about here are reading books, listening to (and playing) music, and watching the movies. Also I write about the work I do with my hands and with my head; and of course about bringing up Sylvia.

The site is a bit of a work in progress. New features will come on-line now and then; and you will occasionally get error messages in place of the blog, for the forseeable future. Cut me some slack, I'm just doing it for fun! And if you see an error message you think I should know about, please drop me a line. READIN source code is PHP and CSS, and available on request, in case you want to see how it works.

See my reading list for what I'm interested in this year.

READIN has been visited approximately 236,737 times since October, 2007.

Friday, October 12th, 2007

🦋 Passed the first test

So in my log I see a bunch of requests today for

GET blog/?k=<keyword> \'\'
and(char(94)+user+char(94))>0 and 
\'\'\'\'=\'\'

where <keyword> is one of the keywords that links exist to on the site; and also I see that my script translated those requests to

<keyword> \\\'\\\' 
and(char(94)+user+char(94))>0 and 
\\\'\\\'\\\'\\\'=\\\'\\\'

before passing them to the database. So the queries just returned empty sets instead of wreaking whatever havoc they might have wruck unescaped. Yay PHP! Yay careful programming!

(Note: but while editing this post I realized there is a different kind of escaping that you have to do when you are writing to forms -- the < and > signs were translating to markup in my inputs. Funny I never ran into that problem on the old site, you wouldn't think it would be a PHP-vs.-ASP distinction.)

Update: So what do I have to do to ban these guys from my site? I tried putting the following in my httpd.conf:

<Directory (path to root of my site)>
    order allow,deny
    deny from (IP)
    deny from (IP)
    allow from all
</Directory>

and restarting the service, but that does not seem to have done it.

Another Update: I think I got it: the Directory directive in apache2/sites-available/default is overriding the directive in httpd.conf because httpd.conf is included first. I think I just need to take the default directive out.

posted evening of October 12th, 2007: Respond
➳ More posts about The site

Saturday, November 10th, 2007

🦋 Comment Spam II

OK: The comment spam filter I have in place right now is working (so far); but it would be pretty easy to circumvent if a spammer was determined enough. But I have in mind a pretty simple way to expand it and make it secure, and way better than the captcha images that everybody hates. (Drawback is, it relies on Javascript, which not every browser supports. This could be gotten around a couple of different ways.) I am going to try and implement it over the next few weeks and then I will write it up and try to get other people using it -- it's way better than captchas. (I won't write it up until it's in place because the writeup would include information on how to get around the current, insecure filter I have in place.)

Update: Oh wait, no it actually wouldn't be much more secure than the current scheme. A little harder to get around I guess.

posted afternoon of November 10th, 2007: Respond
➳ More posts about Programming Projects

Wednesday, January 23rd, 2008

🦋 watch

Cool! I found and fixed a bug today using gdb's watchpoint feature, which I have never tried before. (Not cool: the bug was a careless typo that I should never have introduced.)

posted evening of January 23rd, 2008: Respond
➳ More posts about Projects

Wednesday, February 13th, 2008

🦋 1024

A good thing to keep in mind when you are trying to write a TCP server that will support thousands of simultaneous connections: the default ulimit for maximum number of open files on a Linux system appears to be 1024. A further thing to keep in mind once you figure out how to adjust that upwards: there is a reason the default maximum is 1024!

See, if you use select() to multiplex your I/O (like I do), you will be passing structures of type fd_set around. These structures can only deal with file descriptors less than or equal to 1023. Try and set bit 1024, and you will break your program. But fear not! There is a solution; that solution is to use poll() instead of select(); apparently poll() is the new standard. First I ever heard of it! Switching from one to the other seems like it's not too hard, though I've just now started.

posted afternoon of February 13th, 2008: Respond

🦋 `vim` rules!

Here's a neat trick: let's say you forgot to close a code block somewhere in a large C source file, and you can't figure out where, and the compiler is not helping you. Try putting a } character at the very end of the file, placing the cursor on it, and presing the % key -- vim will jump to the most recent { which was not matched by a }. (Note: this will not work if you have unmatched {'s inside conditional compilation blocks, which is generally a bad habit anyway.)

posted afternoon of February 13th, 2008: Respond

Friday, February 15th, 2008

🦋 Multiplexing

The fix is in -- the server is using poll instead of select, a new version has been built and delivered to the client, it can handle loads of clients. Here is the long and short of how you do it (without error-checking, which is dull*):

The Old Code

void select_files(int *fds, int nfds)
{
    int i, maxid;
    fd_set rset, wset;
    timeval tval;

    FD_ZERO (&rset);
    FD_ZERO (&wset);
    maxid = 0;
    for (i = 0; i < nfds; ++i) {
        FD_SET (fds[i], &rset);
        FD_SET (fds[i], &wset);
        if (fds[i] > maxid) maxid = fds[i];
    }
    tval.tv_sec = 5;
    tval.tv_usec = 0;

    select (maxid + 1, &rset, &wset, NULL, &tval);
    for (i = 0; i < nfds; ++i) {
        if (FD_ISSET(fds[i], &rset)) 
            read_file(fds[i]);
        if (FD_ISSET(fds[i], &wset)) 
            write_file(fds[i]);
    }
}

The New Code

void poll_files(int *fds, int nfds)
{
    int i;
    pollfd *pfds = (pollfd *) 
              malloc (nfds * sizeof (pollfd));

    for (i = 0; i < nfds; ++i) {
        pfds[i].fd = fds[i];
        pfds[i].events = POLLIN | POLLOUT;
        pfds[i].revents = 0;
    }

    poll (pfds, nfds, 5000);
    for (i = 0; i < nfds; ++i) {
        if (pfds[i].revents & POLLIN) 
            read_file(fds[i]);
        if (pfds[i].revents & POLLOUT) 
            write_file(fds[i]);
    }
}

In order to take advantage of the newly accessible file descriptors above 1024, you will need to add these lines to your /etc/security/limits.conf file:

(username)          soft    nofile          1024
(username)          hard    nofile          4096

I chose 1024 for the soft limit since most apps are not interested in the high number of files, and 4096 for the hard limit because I read on some message boards that performance will degrade above that number. Feel free to choose other values.

You then need to make the following calls from your code (or call ulimit from the script that starts your application):

    struct rlimit nofile;

    if (getrlimit (RLIMIT_NOFILE, &nofile) != 0) {
        fprintf (stderr, "Could not get NOFILE");
        exit (1);
    }
    nofile.rlim_cur = 4096;
    if (setrlimit (RLIMIT_NOFILE, &nofile) != 0) {
        fprintf (stderr, "Could not set NOFILE");
        exit (1);
    }

*If you're interested in the error-checking code, drop me a line -- I just don't feel like typing it out right now.

posted afternoon of February 15th, 2008: Respond

Tuesday, March 18th, 2008

🦋 stderr and cgi

Handy to know: if you write to stderr from a CGI application (note: when I say "CGI application", I don't actually know what that means -- what I mean is, an application invoked by the http server in response to a GET or POST request and which uses environment variables to get information about the request), the output will go into the http server's log file. At least it will if the http server is Apache.

posted afternoon of March 18th, 2008: Respond

Wednesday, May 7th, 2008

🦋 Dream blogging

So last night I was maintaining code for a program which loaded a helper program for handling data files. Before it executed the helper program it would check the sum of the binary, I think because certain instances of the helper needed special handling; if the sum was not on a list of recognized values, the program would log an error and exit. Unfortunately the helper program was not stable and was being recompiled frequently; every time this happened I needed to edit the list of recognized sums, which was hard-coded into the main program, and recompile the main program. I was embarrassed about such a stupid bit of code being in the program so I was editing, compiling, and distributing the main program without mentioning it to anybody. What a stressful dream that was!

(Sort of ties everything together in a way, that I woke up humming Bessie Smith's "Gimme Pigfoot", which was in Gertrude Sturdley's post this week and which I was working on a fiddle version of last night.)

posted morning of May 7th, 2008: Respond
➳ More posts about Dreams

Friday, May 9th, 2008

🦋 Timeouts

Question about timeouts on select(): if anyone has ideas about this, please let me know in comments.

Obviously select() is not a real-time operation; if you pass in a 1-second timeout, you cannot assume that you will get to run again in one second, since the operating system is allotting time to all the processes on the machine: in an extremely busy environment, it could be several seconds before you get the processor back. But I'm wondering whether the timeout is 1 second of real time, or 1 second of execution time -- in the very busy environment where your process does not get another time slice for more than a second, would select() continue to wait on the files you passed in until it had waited for a second? Or would it return immediately?

(select() as it is used in this post should be read to mean "select() and poll()," since I'm assuming both API's behave the same in this regard. Who knows, maybe they don't! But that seems unlikely to me.)

posted afternoon of May 9th, 2008: 2 responses

Wednesday, September 10th, 2008

🦋 Indexing and Retrieval

So some people at my company are using Lucene as part of a document retrieval system they're building -- I have interacted with them some and had formed the impression that it was sort of a database product, vaguely like MySql but with more fully featured searching. But now I'm learning a little more about it and am very impressed -- it makes searching totally independent of data storage. This seems like a fantastic idea, I'm really looking forward to learning more about it.

posted evening of September 10th, 2008: Respond

`READIN`

Jeremy's journal