arch bash cakephp conf dauth devops drupal foss git golang information age life linux lua mail monitoring music mysql n900 netlog openstack perf photos php productivity python thesis travel uzbl vimeo web2.0

Tweaking Lighttpd stat() performance with fcgi-stat-accel

If you serve lots of (small) files with Lighttpd you might notice you're not getting the throughput you would expect. Other factors (such as latencies because of the random read patterns ) aside, a real show stopper is the stat() system call, which is a blocking system call ( no parallelism ). Some clever guys thought of a way to solve this : a fastcgi program that does a stat(), so when it returns Lighty doesn't have to wait because the stat information will be in the Linux cache. And in the meanwhile your Lighty thread can do other stuff.
(in Lighty 1.5 there will be a native way for asynchronous stat() calls but for 1.4 this hack works pretty damn well)

This is explained on the HowtoSpeedUpStatWithFastcgi page on the Lighty wiki.

Now, for Netlog we needed to add some http headers ( Cache-Control: max-age, ETag, Expires and Last-Modified ) so we patched up the code a bit to do that, and a bit of other stuff.

Ofcourse this is documented on the FcgiStatAccelWithMoreHttpHeaders page on the Lighty wiki

Have fun !

#include #include #include "fcgiapp.h" #include #include #include #include #include #include "etag.h" #include "buffer.h" #define THREAD_COUNT 20 #define FORBIDDEN(stream) \ FCGX_FPrintF(stream, "Status: 403 Forbidden\r\nContent-Type: text/html\r\n\r\n

403 Forbidden

\n"); #define NOTFOUND(stream, filename) \ FCGX_FPrintF(stream, "Status: 404 Not Found\r\nContent-Type: text/html\r\n\r\n

404 Not Found

\r\n%s", filename); #define SENDFILE(stream, filename, headers) \ FCGX_FPrintF(stream, "%sX-LIGHTTPD-send-file: %s\r\n\r\n", headers, filename); #define EXPIRATION_TIME (int) 60*60*24*30 int genheaders (char* mybuffer, size_t bufferlen, const char* file) { char timebuf[32]; //possibly unsafe char lastmodbuf[32]; //possibly unsafe char etag[128]; //possibly unsafe struct stat statbuf; time_t exp; time_t lastmod; buffer *etag_raw; buffer *etag_ok ; //create buffers for Etag etag_raw = buffer_init(); etag_ok = buffer_init(); // Stat the file if (stat (file, &statbuf) != 0) { return -1; } // Clear the buffer memset (mybuffer, 0, bufferlen); // Get the local time exp = time (NULL) + EXPIRATION_TIME; lastmod = statbuf.st_mtime; strftime (timebuf, (sizeof (timebuf) / sizeof (char)) - 1, "%a, %d %b %Y %H:%M:%S GMT", gmtime (&(exp))); strftime (lastmodbuf, (sizeof (lastmodbuf) / sizeof (char)) - 1, "%a, %d %b %Y %H:%M:%S GMT", gmtime (&(lastmod))); etag_create(etag_raw, &statbuf, ETAG_USE_SIZE); etag_mutate(etag_ok, etag_raw); buffer_free(etag_raw); snprintf (mybuffer, bufferlen, "Cache-Control: max-age=%d\r\nETag: \%s\r\nExpires: %s\r\nLast-Modified: %s\r\n", EXPIRATION_TIME, etag_ok->ptr, timebuf , lastmodbuf); buffer_free(etag_ok); return 0; } static void *doit(void *a){ FCGX_Request request; int rc; char *filename; char extraheaders[192]; int r; FCGX_InitRequest(&request, 0, FCGI_FAIL_ACCEPT_ON_INTR); while(1){ //Some platforms require accept() serialization, some don't. The documentation claims it to be thread safe // static pthread_mutex_t accept_mutex = PTHREAD_MUTEX_INITIALIZER; // pthread_mutex_lock(&accept_mutex); rc = FCGX_Accept_r(&request); // pthread_mutex_unlock(&accept_mutex); if(rc < 0) break; //get the filename if((filename = FCGX_GetParam("SCRIPT_FILENAME", request.envp)) == NULL){ FORBIDDEN(request.out); //don't try to open directories }else if(filename[strlen(filename)-1] == '/'){ FORBIDDEN(request.out); //open the file }else if((r = genheaders (extraheaders, 191, filename)) != 0){ NOTFOUND(request.out, filename); //no error, serve it }else{ SENDFILE(request.out, filename, extraheaders); } FCGX_Finish_r(&request); } return NULL; } int main(void){ int i,j,thread_count; pthread_t* id; char* env_val; FCGX_Init(); thread_count = THREAD_COUNT; env_val = getenv("PHP_FCGI_CHILDREN"); if (env_val != NULL) { j = atoi(env_val); if (j != 0) { thread_count = j; }; }; id = malloc(sizeof(*id) * thread_count); for (i = 0; i < thread_count; i++) { pthread_create(&id[i], NULL, doit, NULL); } doit(NULL); free(id); return 0; } ]]>


Particular reason you're not just using the fam/gamin backend for stat caching? Depends upon how long it caches mind you, but that's event based notification of stat changes- enough cache space, you have a single stat per changed/unseen file.

Grand scheme, presuming a large cache pool, that ought to outpace any attempt at making stat async- although admittedly, it still a blocking op for when the cache entry is stale/missing (thus combining them could produce interesting results).

We tested with fam/gamin but had problems with it ( I don't know any specifics about this however ). Afaik using either fam or gamin is generally not recommended because many people have issues with them.

About the cache pool : doesn't Linux itself already cache this well enough ? ( or does it maybe rather purge old stat entries out of the cache in favor of.. say disk blocks ? )

The kernel does maintain dent caches- that said it still is a context switch for every request. Can be fast, but pulling it from a userland pool will be faster (presuming it implemented sanely of course).

Re: fam/gamin issues, haven't ran into them personally- saw a general response speed up in my own testing, although gamin itself could get pretty pissy about proc use.

Good point about the context switches.

What about server.max-worker param?
Have you tried to use it?
It shoudl work in similar way like your solution...

max-worker just specifies the amount of processes.
processes are very expensive. They use memory, they have their own cache, their own acceslog-fd etc.
Also, each process has its own context. So the cpu will start doing a lot of context switching, which is just a waste of cpu power. With Fastcgi you have 1 process that can do an immense amount of requests because the process does internal multiplexing (it has structs for each request). (although for a storage server it doesn't matter much, it matters more for web servers).

So yes, with more processes you can address the same problem, but it is a much less efficient method to do so.
Instead of forking many expensive processes - which will spend most of their time waiting for disk anyway - we prefer to fix the problem at it's root, with a low-impact solution.





What is the first name of the guy blogging here?

This comment form is pretty crude. Make sure mandatory fields are entered correctly.
Basic html tags (a,i,b, etc) are allowed, others are sanitized