Apache was probably the first Linux application I learned how to configure. However, I’ve gotten a bit frustrated with it recently…
The Problem
A memory leak. Apache is eating up memory so quickly that I need to restart it every couple days or risk my entire server grinding to a halt as it starts swapping wildly. I’ve poured over log files and pmap output, but I still can’t figure out where the problem lies. Curse you monolithic in-process architecture!
Actually I know what my problem is, I’m running a mess of modules:
ssl– 2 certificates on 2 portsphp5– blerg, who doesn’t have to run this?suphp– I suspect this is my problem, but I can’t prove it. A client’s 3rd party web application requires it, but I think its easily replaceable with FastCGI.wsgi– No complaints. Python apps are out-of-process thankfully.proxy– Again no complaints. Can’t imagine how this module could cause any problems except it does proxy some large (multi-megabyte, not huge) POSTs at times. I can’t imagine a memory leak could slip into this module without a lot of people noticing.
Solution A: Apache+FastCGI
I love the idea of putting each web application in its own process and letting Apache just act as an HTTP router. FastCGI seems to have all the features I need, and I’m not really worried about the CPU overhead incurred by IPC.
However, there are 2 competing FastCGI modules for Apache, and I have no idea what to choose. Anecdotally the official mod_fastcgi is buggy and fastcgi.com is a spam infested wasteland. However, I’ve found no authoritative source saying: “fastcgi is dead, long live fcgid!” (Lame excuse, I know.)
Solution B: Lighttpd
I know Lighty is the darling of Rails sites, but whenever I stop by its site I’m greeted with a list of recently fixed security bugs, and now it seems as though they’re rewriting the core!
I’m sure Lighty is a high quality intelligently engineered project, but it seems to be the definition of immature. Not necessarily bad (in fact it usually means its progressing quickly!), but perhaps not as reliable as good old workhorses like Apache.
Solution C: Cherokee
I’ve been following Cherokee for some time now and running it locally on my workstation. I love the web interface. I’m usually a very anti-webmin, pro-vim kind of guy, but I’m sick of editing Apache’s config files. I do it about once a month and therefore it always takes lots of double-checking the docs. I don’t know why, but its configuration has just never felt natural to me.
However, the lead Cherokee developer’s bravado is by the most off-putting aspect of the project. He mocks modwsgi and posts simplistic benchmarks showing Cherokee to be the fastest web server, but meanwhile Cherokee churns out numerous bug patch releases in-between feature releases and has yet to reach 1.0 status.
It seems like an excellent project technically, but I’m afraid there will be negative consequences for the lead developers hubris. (I’m not meaning to insult the guy. He’s probably a far better hacker than I’ll ever be. Self-promotion just makes me uncomfortable.)
Solution D: nginx
I don’t know much about nginx except that it works. Basically all I’ve heard about it is:
- It works.
- Its fast. Really fast.
While “working” is definitely my primary objective, nginx seems a bit bare bones for me. I just don’t think I’m the target demographic. I’d kind of like for my web server to handle spawning and kill of FastCGI processes.
nginx feels like git to me. Those who know it: use it and love it. Those who don’t: stand in fear and awe of its unbridled power.
…or maybe its just a nice simple barebones HTTP server…
Conclusions?
I think Solution A: FastCGI is the most sensible. Apache has always served me well, and the memory leak is most likely due to that shoddy suphp module.
Moving my web applications to FastCGI is also the best way to prepare to move to one of these 2nd generation web servers.
However, I’m getting kind of sick of Apache, and the ambiguousness of which FastCGI solution to choose is fairly annoying.
So dear lazyweb, for your everyday web developer consultant looking to run a bunch of PHP and Python web applications, what HTTP server stack should I use? (Debian Lenny packages are a plus.)
Tags: apache, cherokee, lazyweb, lighttpd, nginx, PHP, Python
When you say ‘wsgi’, I am guessing you probably mean mod_wsgi. I wish people would say mod_wsgi if that is what they mean. The mod_wsgi module is only one implementation of the WSGI interface, it is not WSGI itself
Anyway, do you perform ‘restart’ or ‘graceful-restart’ on your Apache instance a lot?
There is known issue with some memory leaks when using mod_python and mod_wsgi when doing these. Part of the problem is the respective modules and some is because some versions of Python interpreter itself leaks some memory when destroyed and then reinitialised. The amount isn’t overly significant, but if you do restarts a lot for one reason or another, it could add up.
This memory leak being a source of problems would be evident through the Apache parent process growing over time with each restart. Because the leak occurs in the Apache parent process, any new Apache child worker processes, or mod_wsgi daemon mode processes, also appear to use more and more memory over time if there is a need for them to be recycled.
The memory leaks have been dealt with in mod_wsgi 3.0 development version in subversion, but not yet backported to mod_wsgi 2.X (2.4). The changes can’t help with the leaks in Python interpreter itself directly, but there are other changes in progress which would allow one to optionally defer Python interpreter initialisation till point where mod_wsgi daemon processes are created. This way if using mod_wsgi daemon mode only, the memory leaks in Python interpreter will not impact on parent Apache process or Apache child worker processes as no longer a need to destroy and reinitialise Python interpreter. Downside of this delayed initialisation though is slightly bigger resident memory size for mod_wsgi daemon processes as do not benefit from sharing when Python is initialised in Apache parent process before processes are forked.
If this at all sounds about right, you might like to come over to the mod_wsgi list on Google Groups to discuss further.
@Graham: First, thanks for the great module!
Yes, I mean modwsgi when I say “wsgi”. In the context of talking about Apache modules it seems to silly to repeat the mod_ prefix. I should have also specified I’m using modwsgi in Daemon mode (in-process only seems useful for single purpose servers).
I do restart (not even gracefully…) Apache often to try and quell the memory leak. I’m sad to hear the leak might be in modwsgi!
I’m using modwsgi packaged with Debian Lenny, so its built against Python 2.5. I thought only 2.4 had (interpreter) memory leaks, but I could be wrong.
I’ll try to hop over to the modwsgi group ASAP since it sounds like it might be involved.
I wouldn’t mind running modwsgi for Python web apps and FastCGI for PHP apps.
If you are happy with checking out stuff from subversion, the change to address the memory leak has been backported to mod_wsgi 2.X subversion branch.
http://modwsgi.googlecode.com/svn/branches/mod_wsgi-2.X/
Details of other changes in that branch at:
http://code.google.com/p/modwsgi/wiki/ChangesInVersion0204
mike: thx for a lot of useful information. i want to challenge your views on fastcgi tho. i believe it to be an outmoded, outdated thing. it looks like a vhs tape—you know what those are heading for.
nginx is definitely a tad barebones, but it works. to configure it is a breeze. frankly i think mod_python is on its way out, too. mod_wsgi is a great project, but suffers from its concept—a compiled thingie bolted onto apache.
with nginx basically what you do is you run your webapplication on some port, say 3000, and then go and tell the server that url yourhost.com is meant as an alias for localhost:3000. its that simple. no more fuzzing around with intricate mod_xy things. and forget about fastcgi, its dead. i’d rather use classical cgi than any f/s/XYcgi.
Michael,
You can run separate instances of apache for your mod_wsgi and your php stuff. I forget the command but you can load two different apache services w/ different apache.conf files.
I doubt moving to a different web server will fix all your memory leak problems. Apache is the most reliable web server in my limited experience. Alas my experience is quite limited
MOD_WSGI
In my experience, this works fine and doesn’t leak memory. However, I cannot restart WSGI apps individually because mod_wsgi does something stupid and fails. I have to do a force-reload of Apache when I redeploy an app, which seems ridiculous.
AUTOMATIC RESTARTING
Both PHP and suPHP have a long record of memory leaks. Many sites using them rely on monitoring systems like Monit to restart Apache when it’s consuming too much memory. While hackish, this approach will let you keep your setup as-is.
RUNNING FASTCGI
A. Apache+FastCGI – As of a year ago, its various FastCGI modules seemed abandoned and unsupported, and didn’t compile without manual edits to .c files. I got mod_fcgid running, but had to dump it because it’d regularly fail with cryptic errors or lose communication with processes it spawned, and thus slowly consume all available memory if not supervised.
B. Lighttpd – As of two years ago, this was a clusteruck and its authors didn’t care. I was forced to admin a Lighty setup for half a year because a client refused to let me replace it. In a span of about 30 seconds, Lighty could kill a dedicated server by fork bombing, swamping its memory and CPU. The only way to curtail this was to have a watchdog check every second to determine if Lighty had gone rabid and restart it. About three years ago, there were some misguided articles recommending running Rails on Lighty, but everyone running Rails has long ago moved to specialized Ruby servers like Mongrel, Thin or Passenger, which entirely avoid the FastCGI mess.
C. Cherokee – Dunno. I never got the impression that it was a serious project.
D. Nginx – It’s fast, efficient, totally reliable, and the English documentation is decent. However, “Unlike Apache or Lighttpd, Nginx does not automatically spawn FCGI processes. You must start them separately.” This isn’t hard, but requires extra work and I wish it’d just do it on its own because I can’t see why it’s so difficult. This is the only FastCGI implementation I’ve been able to live with. See articles: http://wiki.codemongers.com/NginxFcgiExample && http://drupal.org/node/110224
I am very happy using Nginx for nearly all my web serving neccessities. I am currently running Python WSGI apps using CherryPy’s standalone server, some Rails apps using Thin, and Nginx in fron of them for static content and proxying requests to them. I also run some PHP stuff from time to time using FastCGI, also with Nginx as frontend. In general I am very happy with the solution, as I can run every app under different credentials and also use daemontools for all of them (except Nginx) to keep them running even if they crash. Apart from its proven efficiency, one of the things I like most of Nginx is that its configuration file syntax is easy to read and write.
Cheers
Thanks to everyone for the great and detailed advice!
Lighty is right out at this point, but it looks like now its up to me to do some testing and figure out what works best for environment. (Honestly, I think Cherokee is out as well, but I’ve already got it installed locally so it will probably make it to round 2.)
I should have given more details about my setup: Other than the couple of PHP apps, I run a few CherryPy apps behind mod_proxy (love this setup, but I need to standardize my startup scripts), and a couple Django apps using mod_wsgi (more Django apps are coming soon though).
Bolting CherryPy’s web server on the front of Django and proxying it via nginx is actually pretty tempting if I can find a comfortable way to manage starting/restarting all the processes.
Anyway, thanks again for the feedback, and I’ll keep everyone posted!
(More feedback is also welcome, especially concerning Cherokee and PHP+FastCGI. nginx only ever gets glowing reviews.
)
@bryan’s suggestion of running separate Apache processes would be a good way to isolate the fault, but can be annoying to configure if you’re relying on the Debian/Ubuntu-style Apache configuration layout.
@hario’s suggestion of using daemontools to manage the FastCGI applications is a good one. Here’s an article on the topic: http://johnleach.co.uk/words/archives/2007/04/08/262/ — the `spawn-fcgi` tool described comes with lighttpd, but it works fine with any other web server.
One has to separate fastcgi as a protocol from a specific implementation which supports it when one tries to make a judgement that it is dead. Although some implementations may be getting neglected, it doesn’t mean that fastcgi as a whole is dead or is a dead end.
As counter example, in Apache 2.4 (still in development) there will be a mod_proxy_fcgi module. Thus support for fastcgi is becoming part of core set of modules in Apache. There is also a process going on at the moment for mod_fcgi to be moved under the control of the Apache Software Foundation, all the ownership and copyright issues are being sorted out at the moment. I can’t recollect whether this meant it would go direct into Apache itself, or whether there would be a merging of some features into mod_proxy_fcgi if this transfer of control works out.
@igalko, most people who have issues with mod_wsgi application restarting have them because they haven’t read the documentation in enough detail to know in what circumstances it works. This isn’t helped by people on irc channels continually giving out wrong information about it. In short, if you are using embedded mode you must do a full Apache restart to pick up code changes. The ability to touch the WSGI script file and have it reload the application only applies when using mod_wsgi daemon mode. Many times though when people think they have configured daemon mode properly, they haven’t, and they are still running in embedded mode and so it doesn’t work. Because daemon mode is only available with Apache 2.X on UNIX, and only if Apache has thread support plus other stuff compiled in to APR library, then also don’t expect it to work on Windows or Apache 1.3. If you are confident they you are using daemon mode and are still having issues with mod_wsgi application reloading, use the mod_wsgi list on Google Groups to sort out why.
If you keep using Apache, I have had great luck with the latest mod_scgi + supervisor (supervisord.org). I just have a managed process that starts django as follows:
# set DJANGO_SETTINGS_MODULE
if __name__ == ‘__main__’:
from flup.server.scgi import WSGIServer
from django.core.handlers.wsgi import WSGIHandler
WSGIServer(WSGIHandler()).run()
And then I use SCGIMount in a virtualhost. Works like a charm.
If you want to get away from apache, though, I have been intrigued by Spawning, a python+libevent-based webserver. (Look on PyPI.) You can serve from spawning directly, but if you need to run several different types of apps, proxying behind nginx would probably work.
@VanL: Spawning! I always mean to check it out, but never get the chance. Using it to host Django sounds very tempting.
It seems PHP is still the thorn in my side. FastCGI definitely sounds like the way forward for it, but it sounds like nginx might be the best way to achieve a stable FastCGI+PHP setup.
(Sorry that wordpress messed up the tabs in your code… I really need to get around to writing yet-another-Django-powered-blog-engine
)
Lest it go unmentioned, there is also SCGI[1] (simpler [2] than FastCGI), driven by either Apache (mod_scgi), Cherokee (?), lighttpd (native support) and nginx (iirc with third party add-ons). I believe there are SCGI recipies for Django lurking about. SCGI has been around for many years now running sizable sites long before Django was a twinkle in someone’s eye. Perhaps it deserves a re-look.
Personally I do not care for Cherokee’s reliance on a web admin for setup – that turned me off after a few minutes of playing with it. Maybe I’m wired differently but I’ll take any day a text control file I can edit directly over a web interface which isn’t immediately obvious (I did not find Cherokee to be to).
The relative maturity of these various products does matter; I would be inclined to use the simplest product possible if given a choice.
I run my Python web apps (using the QP[3] web framework, a cousin to Quixote) behind either Apache or lighttpd “mod_scgi” front ends … mostly lighttpd.
[1] SCGI: http://python.ca/scgi/
[2] SCGI Protocol: http://python.ca/scgi/protocol.txt
[3] http://www.mems-exchange.org/software/
Incidentally, some long time ago now I was also fighting a memory leak in Apache which is what caused me to look at lighttpd in the first place. The one PHP application I have to support for clients is a web mail solution – I moved that app to lighttpd and my memory issues disappeared. Eventually I ended up migrating away from Apache entirely.
Now if only there were a decent web mail solution in Python I might be free of PHP entirely… possibly a worthy goal. If one worries about 18 security advisories for “lighttpd“ then one should also worry about the 335 associated with Apache or more pointedly the 3749 entries associated with PHP.
Argh, I need to hurry up and do my tests and post a follow-up! Mea culpa.
@Michael Watkins:
SCGI sounds fine for Python apps, but once again I’m left with no clear “safe” way of supporting PHP. At least the one library I ran into quickly seemed awfully hackish (and I *hate* hacking PHP these days).
I love text config files for most things, but for some reason Cherokee’s web admin just clicked with me. Perhaps its because I’m a web developer first, sysadmin second. One less config file format to remember (especially Apache’s weird quasi-xml one) would be welcome. However, give me 6 months with Cherokee’s web admin, and I might be begging for vim access again.
I understand about the webmail solution in Python. I actually started writing one long ago but abandoned it when I switched jobs. The Edgewell crew started one called Posterity, but it looks very dead: http://posterity.edgewall.org/
@Graham Dumpleton – Thanks for the suggestions on doing non-invasive Python webapp restarts with mod_wsgi. I was indeed using embedded mode, and all’s well now that I’ve switched to daemon mode.
[...] Web Server Quandary « michael schurter [...]
[...] I actually don’t plan on learning git any time soon. My time for playing with new technology is currently taken up testing nginx, spawning, and FastCGI. [...]
Have you tried IIS6?
@Seriously: Ha. I assume you’re joking/trolling, but actually I have administered an IIS6 server in the past. Had to run an ASP.NET application + PostgreSQL on a GoDaddy server. It was hellish to setup, obnoxious to maintain, and performed unbelievably awful.
I think the only reason it exists is because some developers can only code inside Visual Studio, and so the only web development platform available to them is ASP.NET + IIS. The Microsoft ghetto. *shudder*