httpd dead but subsys locked - only websrvmng starts Apache

General Discussion of atomic repo and development projects.

Ask for help here with anything else not covered by other forums.
breun
Long Time Forum Regular
Long Time Forum Regular
Posts: 2813
Joined: Sat Aug 20, 2005 9:30 am
Location: The Netherlands

Re: httpd dead but subsys locked - only websrvmng starts Apa

Unread post by breun »

The one not behaving correctly is running httpd-2.2.17-1.el5.art because of segfault problems we had a couple of months ago. All or nearly all of our other CentOS 5 machines are running Apache httpd 2.2.3 as provided by CentOS.

Another difference with other servers we manage is that this one is still on Plesk 8.6.0, because this is an older Media Temple server and Plesk can't be upgraded by the client on those machines. Also unique to this machine is that it runs under Virtuozzo on the 2.6.9-023stab052.4-enterprise kernel, although we have a similar Media Temple server running fine under Virtuozzo on the 2.6.18-028stab085.3 kernel with Plesk 10.

Other than that I can't think of anything special on this machine right now.
Lemonbit Internet Dedicated Server Management
User avatar
mikeshinn
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 4149
Joined: Thu Feb 07, 2008 7:49 pm
Location: Chantilly, VA

Re: httpd dead but subsys locked - only websrvmng starts Apa

Unread post by mikeshinn »

The one not behaving correctly is running httpd-2.2.17-1.el5.art because of segfault problems we had a couple of months ago. All or nearly all of our other CentOS 5 machines are running Apache httpd 2.2.3 as provided by CentOS.
Correct me if I'm wrong, but as I recall you were running 2.2.3 the last time we checked correct? If so, then its not 2.2.17 if this occurred with 2.2.3.
2.6.9-023stab052.4-enterprise kernel,
WOW! Thats positively ancient! (See below, I think we have a likely candidate)
2.6.9-023stab052.4-enterprise kernel, although we have a similar Media Temple server running fine under Virtuozzo on the 2.6.18-028stab085.3 kernel with Plesk 10.
Thats a pretty big difference between the two. So the keys differences are plesk and kernel. The strace you sent is hung up on a kernel function call, and thats the blocker. I think you just found the most likely root cause candidate, your kernel is killing your restarts its just taking so long things get all hung up. My advice would be to get that kernel upgraded. You have some ancient code there, and its got known problems with performance.
breun
Long Time Forum Regular
Long Time Forum Regular
Posts: 2813
Joined: Sat Aug 20, 2005 9:30 am
Location: The Netherlands

Re: httpd dead but subsys locked - only websrvmng starts Apa

Unread post by breun »

EL4 is also still on 2.6.9. :)

I'm pretty sure that Media Temple will just tell the client to get a new VPS if they want to run on a newer kernel. Can you think of any way to positively verify that the kernel is really the problem here? And how could it be that it worked before and doesn't now? The kernel didn't change.
Lemonbit Internet Dedicated Server Management
User avatar
mikeshinn
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 4149
Joined: Thu Feb 07, 2008 7:49 pm
Location: Chantilly, VA

Re: httpd dead but subsys locked - only websrvmng starts Apa

Unread post by mikeshinn »

EL4 is also still on 2.6.9. :)
Heh, yep and soon to be DOA, or requiring an expensive Extended Life Cycle Support (ELS) Subscription (which is limited to specific packages and hardware):

http://wiki.centos.org/FAQ/General#head ... dde5b75e6d

https://access.redhat.com/support/polic ... es/errata/

And with a 7 year support cycle, thats a pretty old setup! :-)
I'm pretty sure that Media Temple will just tell the client to get a new VPS if they want to run on a newer kernel. Can you think of any way to positively verify that the kernel is really the problem here? And how could it be that it worked before and doesn't now? The kernel didn't change.
If its a VPS, then is it safe to assume there are other VPS' on the same system? If so, then this being transient means its likely caused by something either changing in the host node, or maybe one of the other VPS' is challenging the system. Thats damn hard to show inside the VPS, youre playing inside a chroot and an abstraction. This is really something you have to look at from the host node.

The downside is that because 2.6.9 is positively ancient, and all the virtualization technology built around it is also really old, and many generations behind whats used now you're very unlikely to get Parallels or Redhat to put much effort into fixing it, but it can't hurt to ask. Just be prepared for them to suggest you upgrade with the short window of life left on EL4.

Oh course, it could just be bad tuning of the VPS or host node, lack of resources. This being transient that would be my guess, the VPS isnt getting what it needs to perform well. In which case DEFINITELY open a case with Parallels. This might be a simple problem of tuning.

Or, like I said, it could be an actual bug in the kernel (all the spinlock stuff changed since 2.6.9, and for good reason) and they may not be willing (or able) to fix it without you upgrading. The code you are using is so old its got dust on it! :-(

Free advice: This VPS has been causing you grief for some time, and EL4 has all sorts of baggage on its own that was only fixed in EL5 and EL6, its really a losing battle trying to pin down the cause with the OS vendors getting ready to drop support soon, and then everyone else. You're going to have to upgrade soon if you want any support, so maybe this is the kicker to start planning. I assume theres some good reasons you can't upgrade, or you would have already. Wish I had an easy answer on this one. The clock is ticking though, so eventually that day is going to come when its just EOL, so no matter what the solution is to this problem, an upgrade is in this VPS' future. :-)

Now if you must stick with this old code, open a case with Parallels and dont give them ammo to blow you off. Its their kernel, and if they still support it get them looking at this. If they don't support it, well put a fork in it, its done and move on to a supported platform.

Whatever you do, don't tell them "well if I disable this it goes away". Unfortunately, thats just asking for the old Drs joke response from the support person: ;-)

Patient "Doctor it hurts when I do this"
Doctor "Then don't do that. That'll $50"

Just tell them apache isnt restarting correctly, and leave it at that. This is definitely a problem being caused down at the kernel or VPS configuration level, I can see the strace you sent and its stuck waiting, for what who knows (youre in the VPS, so we cant see the full system). Disabling modsec is just telling apache to do less work, which is just hiding the disease by treating the symptom. It might be a configuration problem with the VPS, or it could be a kernel bug, could be something else. Hard to say for sure from inside the VPS, we're just staring into a mirror. Need to get behind it.

If they insist on pushing you off because you are using 2.2.17, roll back to the buggy 2.2.3 and play dumb and let us know what they say. My guess is its a configuration problem with the VPS. Its transient, so I think the host node is just not giving the VPS enough resources or something.
Post Reply