Apache segmentation fault
Argh! And it is getting worse. All since yesterday.
But I've found something very interesting.
It didn't occur to me until now, but I have another VPS running mod_sec 2.1.0 (non-ASL) on an old rule set and it is not segfaulting and not producing errors of any kind whatsoever.
The VPS in question has been allocated very little memory (1Gig)
It has the same apache config as the ones that are giving the errors.
On the other hand it has php 4 from the Centos Base repo. The ones giving the errror are php 5 from CentosPlus.
Scott - do you think I should try compiling mod_sec from the SRPM on the problem systems, just in case?
Faris.
But I've found something very interesting.
It didn't occur to me until now, but I have another VPS running mod_sec 2.1.0 (non-ASL) on an old rule set and it is not segfaulting and not producing errors of any kind whatsoever.
The VPS in question has been allocated very little memory (1Gig)
It has the same apache config as the ones that are giving the errors.
On the other hand it has php 4 from the Centos Base repo. The ones giving the errror are php 5 from CentosPlus.
Scott - do you think I should try compiling mod_sec from the SRPM on the problem systems, just in case?
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
Yup. I will definitely do so.
But I have some intersting news in the meantime.
I'm convinced the problem *for me and maybe other vps users* is down to memory usage in some way.
I've been asking myself what difference there was between the VPS that's working fine, and the vpses that are not (other than php version and mod_sec version). And the answer is the rules.
I also asked myself what was different between two days ago and yesterday in the vpses that were "ok" before then causing lots of problems yesterday/today. And the answer is the rules again.
So earlier on today I "disabled" (reduced to three entries) the domain-blacklist for mod_sec because I figured this by itself was a major resource hog (it is the largest of the config files) and was the least critical thing I could disable.
Indeed, have a look at this:
Before:
31795 apache 16 0 199m 168m 6436 S 4.3 4.1 0:04.39 httpd
After:
14181 apache 15 0 119m 88m 3968 S 1.7 2.2 0:00.05 httpd
And for comparison, the vps that has never had any problems but uses an old ruleset and old mod_sec:
7506 apache 15 0 61492 40m 5988 S 12.3 1.9 2:18.60 httpd
As you can see, the reduction in memory usage is huge with the blacklist rules disabled.
And not one single segfault or rule failure so far. I'm crossing my fingers that it will remain this way and I will also do a restart of the vps tomorrow. If I get no segfaults after doing that I'll know it is just that my vpses do not react well to large rulesets.
Faris.
But I have some intersting news in the meantime.
I'm convinced the problem *for me and maybe other vps users* is down to memory usage in some way.
I've been asking myself what difference there was between the VPS that's working fine, and the vpses that are not (other than php version and mod_sec version). And the answer is the rules.
I also asked myself what was different between two days ago and yesterday in the vpses that were "ok" before then causing lots of problems yesterday/today. And the answer is the rules again.
So earlier on today I "disabled" (reduced to three entries) the domain-blacklist for mod_sec because I figured this by itself was a major resource hog (it is the largest of the config files) and was the least critical thing I could disable.
Indeed, have a look at this:
Before:
31795 apache 16 0 199m 168m 6436 S 4.3 4.1 0:04.39 httpd
After:
14181 apache 15 0 119m 88m 3968 S 1.7 2.2 0:00.05 httpd
And for comparison, the vps that has never had any problems but uses an old ruleset and old mod_sec:
7506 apache 15 0 61492 40m 5988 S 12.3 1.9 2:18.60 httpd
As you can see, the reduction in memory usage is huge with the blacklist rules disabled.
And not one single segfault or rule failure so far. I'm crossing my fingers that it will remain this way and I will also do a restart of the vps tomorrow. If I get no segfaults after doing that I'll know it is just that my vpses do not react well to large rulesets.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
Well, I can now say with 100% certainty that disabling the domain-blacklist solved the problem for me. Remember, it isn't the rules themselves that are the problem - it is the amount of memory they use.
I've done everything I can to provoke a segfault or even just a rule failure and I have failed completely.
I did see this though:
==> /var/log/httpd/error_log <==
*** glibc detected *** free(): invalid next size (normal): 0xbf6940c8 ***
[Sat Sep 27 14:43:26 2008] [notice] child pid 6044 exit signal Aborted (6)
But just the one, and it happened all by itself, not when I was provoking a problem. I've never seen that error before.
I think it would be useful to see how many of the entries in the original domain-blacklist actually still resolve. I'll have a go at doing that a little later. It seems pointless to keep domains that have been killed off long ago.
Faris.
Faris.
I've done everything I can to provoke a segfault or even just a rule failure and I have failed completely.
I did see this though:
==> /var/log/httpd/error_log <==
*** glibc detected *** free(): invalid next size (normal): 0xbf6940c8 ***
[Sat Sep 27 14:43:26 2008] [notice] child pid 6044 exit signal Aborted (6)
But just the one, and it happened all by itself, not when I was provoking a problem. I've never seen that error before.
I think it would be useful to see how many of the entries in the original domain-blacklist actually still resolve. I'll have a go at doing that a little later. It seems pointless to keep domains that have been killed off long ago.
Faris.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
ok, with a quick and dirty bash script which basically just does an nslookup on all the domain in domain-blacklist.txt, roughly 32% of the domains no longer resolve.
I'd have thought it would be sensible to remove them. What do you think Scott?
Faris.
I'd have thought it would be sensible to remove them. What do you think Scott?
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
Wish I would have caught this thread a little sooner. I was having nearly constant segfault under high load maybe a month+ ago for some time.
I, too noticed the huge amount of memory each apache process was using with every rule loaded including the RBL, etc. I was also able to find that the RBL checks were quite the memory hog and disabled them as well.
It wasn't immediately that I noticed the segfaults stopping (as it wasn't a showstopper, but more of a nuisance to look at in logs as well as to bang my head against the wall as to why...). Anyhow, first I noticed the drastic decrease of page space I would wind up using on any given day. I went through all the anacron/cron.* and removed what wasn't needed along with some improved MPM tuning within apache.
I did after that check for segfaults and they were nowhere to be found, so chances are, it was simply removing the bloated (hate to call it that, but even if it does have reason to use that much memory, it is) RBL checks that did it. My apache mem usage was really high prior to this, even with the very minimal number of modules loaded just to get my and customer's requirements met (about half of out of the box).
I also ran mem_cache, but yet I never took it off until after the fact. But that obviously uses more RAM, so...could be a combination of things if you simply don't have enough RAM to support the concurrent 200-400MB memory footprints of each httpd proc. I was working with 2.5-3 (xen ballooning based on what was available).
I wish I could have read this first before I had that issue. Seems like it went on forever and I looked in all the wrong places first before happening to stumble across a fix.
I, too noticed the huge amount of memory each apache process was using with every rule loaded including the RBL, etc. I was also able to find that the RBL checks were quite the memory hog and disabled them as well.
It wasn't immediately that I noticed the segfaults stopping (as it wasn't a showstopper, but more of a nuisance to look at in logs as well as to bang my head against the wall as to why...). Anyhow, first I noticed the drastic decrease of page space I would wind up using on any given day. I went through all the anacron/cron.* and removed what wasn't needed along with some improved MPM tuning within apache.
I did after that check for segfaults and they were nowhere to be found, so chances are, it was simply removing the bloated (hate to call it that, but even if it does have reason to use that much memory, it is) RBL checks that did it. My apache mem usage was really high prior to this, even with the very minimal number of modules loaded just to get my and customer's requirements met (about half of out of the box).
I also ran mem_cache, but yet I never took it off until after the fact. But that obviously uses more RAM, so...could be a combination of things if you simply don't have enough RAM to support the concurrent 200-400MB memory footprints of each httpd proc. I was working with 2.5-3 (xen ballooning based on what was available).
I wish I could have read this first before I had that issue. Seems like it went on forever and I looked in all the wrong places first before happening to stumble across a fix.
Aus-city - I'm devastated to hear that you are still having problems, although in a way it is good to know that I had hit the nail on the head with memory (or memory load, I suppose) being the real cause of the problem.
[mind you, this does not explain how Scott was able to get segfaults with NO RULES running]
We have a number of sites on our servers that seem to be skiddie magnets. The removal of the domain-cblacklist has not changed the pattern of the attacks listed. I think the biggest use of the domain blacklist is to prevent comment spam to be honest.
So I would say that it is reasonably safe to reduce the size of that file in any way you want, or even empty it out completely.
Remember that you need to do so in /var/asl/rules/modsecurity (or similar path) as opposed to just /etc/httpd/modsecurity.d/ because somethig somewhere in ASL reads the file from the /var/asl version and writes it to the /etc/httpd/modsecurity.d/ file every so often. I was getting very confused until I figured that out - it was as if the rules files were regenerating themseleves without any asl updates taking place.
Faris.
[mind you, this does not explain how Scott was able to get segfaults with NO RULES running]
We have a number of sites on our servers that seem to be skiddie magnets. The removal of the domain-cblacklist has not changed the pattern of the attacks listed. I think the biggest use of the domain blacklist is to prevent comment spam to be honest.
So I would say that it is reasonably safe to reduce the size of that file in any way you want, or even empty it out completely.
Remember that you need to do so in /var/asl/rules/modsecurity (or similar path) as opposed to just /etc/httpd/modsecurity.d/ because somethig somewhere in ASL reads the file from the /var/asl version and writes it to the /etc/httpd/modsecurity.d/ file every so often. I was getting very confused until I figured that out - it was as if the rules files were regenerating themseleves without any asl updates taking place.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
p.s. if you remove/reduce the domain-blacklist I think you'll find you can put mod_mem_cache back in.
Having said that, its removal has made no difference in the performance of our servers. I think you may need to actually put some configuration info in to make it do anything useful. Similarly, mod_deflate is loaded by default, but you need to put some actual configuration in to make it do anything.
Having said that, its removal has made no difference in the performance of our servers. I think you may need to actually put some configuration info in to make it do anything useful. Similarly, mod_deflate is loaded by default, but you need to put some actual configuration in to make it do anything.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
But aus-city still got the problem with mod-cache off according to his post above...aus-city wrote:Any update to the blacklist to squash this bug?
Even with mod-cache off I still suffer from this during high server loads especially when worms and stuff are ripe.
Will we be notified about restoring mod-cache once this is done?