Apache segmentation fault

faris · Unread post by **faris** » Mon Dec 22, 2008 2:16 pm

And we are back. I had a spate of segfaults the other day.

Interestingly, they started on one VPS, then a few hours later started on another one.

Both VPSes have different memory allocations. One is not very busy (it started segfaulting first) and the other is very busy. Both had plenty of memory overhead -- they were nowhere near their limits.

But here's the thing .. I have a third VPS that I only recently put ASL/mod_sec on, and it never segfaults, even with a full set of rules. Most strangely, it has the least amount of memory allocated to it of the three. In terms of load it is rougly between the two that segfault.

There is a visible difference between the VPS that does not segfault and the other two that do. On this "good" VPS, in the asl-web-gui, I see events going back one or two days in the dashboard.

On the other two VPSes, the list gets zapped at midnight.

All three are fully up to date. But it kind of seems that there's a difference somewhow.

Is this expected or not? Could the new version be using the sqlite logging, and the other two not?

Obviously this has nothing to do with mod_sec itself, BUT if there's the possibility that something is different in this installation then maybe something else under the hood might also not have been updated somehow?

I'm clutching at straws here. This issue is really annoying me.

The cause isn't Zend version. It isn't suhosin version or config. It isn't apache version or config. It isn't php config. But something is different.

Faris.

Unread post by **mikeshinn** » Mon Dec 22, 2008 2:53 pm

Can you diff your apache config files to see what config differences exist between the machines?

Also, are they all running the same apache, apache modules and all other libraries? any differences there?

This sounds like a bug in apache for sure.

Unread post by **scott** » Mon Dec 22, 2008 7:44 pm

And did you upgrade to php 5.2.8 by any chance? Ive had some reports on stability issues with it on 32-bit C4 boxes.

faris · Unread post by **faris** » Tue Dec 23, 2008 2:30 pm

No. I've not updated PHP beyond the CentosPlus version (5.1.6?).

I'll try to find time to do a diff, but in theory the configs are all almost identical, or at least they were until I started fiddling with the maxclients etc a while ago. Certainly the same modules are disabled.

Faris.

Unread post by **mikeshinn** » Tue Dec 23, 2008 10:48 pm

For what its worth, I've definitely seen that changes to MaxClients and changes to the StartServers in the worker MPM cause segfaults on other systems we work with. I'll do some research.

Also you probably mentioned this before but are these 32bit or 64bit systems?

faris · Unread post by **faris** » Wed Dec 24, 2008 11:58 am

Thanks mike.

These are, technically, 32-bit Centos 4 VPSes running on a 64-bit Centos 5 hardware node.

You are also welcome to play with one of the VPSes that's segfaulting if you like.

Faris.

Unread post by **mikeshinn** » Wed Dec 24, 2008 2:33 pm

Thanks I will definitely take you up on the offer to check things out locally. I've been reading up some issues with 64bit, php, apache and segfaults so I wonder if this is related to that.

I'm off for the holidays but will check the forums this weekend.

faris · Unread post by **faris** » Fri Dec 26, 2008 11:08 am

I've PMed you access details.

Here's a diff of the httpd.confs.

107 does not segfault. 101 does segfault.

Code: Select all

[root@vz diffs]# diff httpd107.conf httpd101.conf
62a63
> ########CoreDumpDirectory /core
100,105c101,106
< StartServers       1
< MinSpareServers    1
< MaxSpareServers    5
< ServerLimit       10
< MaxClients        10
< MaxRequestsPerChild  4000
---
> StartServers       4
> MinSpareServers    4
> MaxSpareServers    8
> ServerLimit       150
> MaxClients        150
> MaxRequestsPerChild  1000
117c118
< MaxClients        10
---
> MaxClients        150
157c158
< LoadModule mime_magic_module modules/mod_mime_magic.so
---
> #LoadModule mime_magic_module modules/mod_mime_magic.so
191c192,193
< LoadFile /usr/lib/libxml2.so
---
> #LoadModule jk_module /usr/lib/httpd/modules/mod_jk.so
> #LoadFile /usr/lib/libxml2.so
193a196
> LoadModule frontpage_module /usr/lib/httpd/modules/mod_frontpage.so

FrontPage is not the culprit, however.

Here's another diff, this time 107 (does not segfault) against 112 (does segfault)

Code: Select all

[root@vz diffs]# diff httpd107.conf httpd112.conf
73c73
< KeepAlive Off
---
> KeepAlive On
100,105c100,105
< StartServers       1
< MinSpareServers    1
< MaxSpareServers    5
< ServerLimit       10
< MaxClients        10
< MaxRequestsPerChild  4000
---
> StartServers       8
> MinSpareServers    8
> MaxSpareServers    16
> ServerLimit       250
> MaxClients        250
> MaxRequestsPerChild  3000
117c117
< MaxClients        10
---
> MaxClients        100
191c191
< LoadFile /usr/lib/libxml2.so
---
> #LoadFile /usr/lib/libxml2.so

Obviously there is a drastic difference in the basic MaxClient/ServerLimit etc stuff.

But I've tried keepalive on and off, and that makes no difference.
I've tried reducing and increaseing maxrequestsperchild and that made no difference.

Faris.

faris · Unread post by **faris** » Wed Dec 31, 2008 2:17 pm

wah! Looks like the latest two sets of rules have taken apache over the memory edge again and I'm getting random segfaults every now and then no matter what (previously this would not have happened, at least with the domain-blacklist disabled).

I think I'll have to cut down the malware and jitp rules a bit.

Faris.

Unread post by **mikeshinn** » Wed Dec 31, 2008 5:20 pm

The blacklists for spam are a backup to the spam rules, so if spam seems to be getting stopped by the 30_asl_antispam rules then turn off the domain-blacklists if you are really tight on RAM. We're working on integration with spamassassin for posts which will make the rules less important, but its gonna take time to work out the integration with the right third party applications (Wordpress, Tikis, Wikis, Joomla/Mambo, etc.). Basically we have to limit it to just looking at posts for externally facing posts - and not every post otherwise we might block internal stuff (like IT helpdesks reporting spam) or slow down the box by looking at non-spammy stuff.

Web spam is a huge problem for everyone so we're always working on ways to both stop it and to improve the process. Right now that means more RAM, but once we can do more with spamassassin that will make it less necessary for you to use the blacklists.

The blacklists are sort of a last line of defense - know bad domains that you should always block, but there are other rules that block those as well so you can live without them. The same for JITP patches - as you move down the file the rules get newer, so if you need to cut down rules you can eliminate the older stuff at the top.

In the future we will make it so you can set a window for JITP. Only run patches 30 days old, 90 days, etc. to map to your patching cycle so they will auto-prune or "No more than 100 Just in Time Patches" or both (no more than 30 days and no more than 100, whichever is lower).

faris · Unread post by **faris** » Thu Jan 01, 2009 10:41 am

Thanks Mike,

Yes, the anti-spam rules are extremely useful.

I think I'll prune the jitp before anything else. There are loads of applications in there that aren't used by any customers.

The problem is that I'll need to do that every time there's an update, so the "auto-expire" feature you mention will be helpful.

Faris.

faris · Unread post by **faris** » Sun Jan 18, 2009 5:35 pm

I'm going to post this here rather than opening a new topic or a support case.....

I've not updated to new rules since Thursday/Friday.

On Saturday I updated ASL to 2.0.6 and updated the rules at the same time. As usual I disabled the domain-blacklist.

However, what is very unusual is the amount of memory that's being used now. It has dropped hugely, in addition to which I've not seen even one segfault after restarting apache following the new ruleset install (which is unusual).

mod_sec is still working and all is well as far as I can tell.

I know what the changelog/announce says about 2.0.6 -- there's nothing in there that should have caused any change on this front.

But could there be? Was anything else changed? What about some individual rule in the ruleset that might have been modified -- something particularly complex maybe?

Faris.

faris · Unread post by **faris** » Wed Jan 21, 2009 7:17 am

And the mystery deepens!

I decided to enable the full ruleset on all our systems, including the one that would normally segfault within seconds of doing so.

Nothing. No segfaults. Wierd or what?

BUT I did see some of these in /var/log/httpd/error_log, which I've never seen before:

Code: Select all

ModSecurity: Error reading request body: Connection reset by peer [hostname "webmail.some-domain-on-our-server.co.uk"] [uri "/horde/imp/compose.php"]

and

Code: Select all

ModSecurity: Multipart parsing error: Multipart: Final boundary missing. [hostname "webmail.some-domain-on-our-server.co.uk"]

I've never ever seen these errors before.

It is all webmail again as usual.

So I've disabled the domain-blacklist.txt list again

I'm still seeing these though:

Code: Select all

*** glibc detected *** free(): invalid next size (normal): 0xbeb4d8a0 ***
[Wed Jan 21 08:35:24 2009] [notice] child pid 7292 exit signal Aborted (6)

They are generally been a precursor to a spate of segfaults in the past. So far nothing though.

Faris.

faris · Unread post by **faris** » Wed Jan 21, 2009 7:44 am

I spoke too soon. I did an asl -u just now and allowed all the rules, and BAM, segfaults again.

Oh well.

Faris.

Unread post by **mikeshinn** » Wed Jan 21, 2009 5:44 pm

You can download a copy of the first version of a tool that will autostart apache when it sees processing failures:

http://downloads.prometheus-group.com/t ... _apache.pl

If you have other errors you want it to look for, let me know and I'll add it in.