Apache segmentation fault
And we are back. I had a spate of segfaults the other day.
Interestingly, they started on one VPS, then a few hours later started on another one.
Both VPSes have different memory allocations. One is not very busy (it started segfaulting first) and the other is very busy. Both had plenty of memory overhead -- they were nowhere near their limits.
But here's the thing .. I have a third VPS that I only recently put ASL/mod_sec on, and it never segfaults, even with a full set of rules. Most strangely, it has the least amount of memory allocated to it of the three. In terms of load it is rougly between the two that segfault.
There is a visible difference between the VPS that does not segfault and the other two that do. On this "good" VPS, in the asl-web-gui, I see events going back one or two days in the dashboard.
On the other two VPSes, the list gets zapped at midnight.
All three are fully up to date. But it kind of seems that there's a difference somewhow.
Is this expected or not? Could the new version be using the sqlite logging, and the other two not?
Obviously this has nothing to do with mod_sec itself, BUT if there's the possibility that something is different in this installation then maybe something else under the hood might also not have been updated somehow?
I'm clutching at straws here. This issue is really annoying me.
The cause isn't Zend version. It isn't suhosin version or config. It isn't apache version or config. It isn't php config. But something is different.
Faris.
Interestingly, they started on one VPS, then a few hours later started on another one.
Both VPSes have different memory allocations. One is not very busy (it started segfaulting first) and the other is very busy. Both had plenty of memory overhead -- they were nowhere near their limits.
But here's the thing .. I have a third VPS that I only recently put ASL/mod_sec on, and it never segfaults, even with a full set of rules. Most strangely, it has the least amount of memory allocated to it of the three. In terms of load it is rougly between the two that segfault.
There is a visible difference between the VPS that does not segfault and the other two that do. On this "good" VPS, in the asl-web-gui, I see events going back one or two days in the dashboard.
On the other two VPSes, the list gets zapped at midnight.
All three are fully up to date. But it kind of seems that there's a difference somewhow.
Is this expected or not? Could the new version be using the sqlite logging, and the other two not?
Obviously this has nothing to do with mod_sec itself, BUT if there's the possibility that something is different in this installation then maybe something else under the hood might also not have been updated somehow?
I'm clutching at straws here. This issue is really annoying me.
The cause isn't Zend version. It isn't suhosin version or config. It isn't apache version or config. It isn't php config. But something is different.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
- mikeshinn
- Atomicorp Staff - Site Admin
- Posts: 4155
- Joined: Thu Feb 07, 2008 7:49 pm
- Location: Chantilly, VA
Can you diff your apache config files to see what config differences exist between the machines?
Also, are they all running the same apache, apache modules and all other libraries? any differences there?
This sounds like a bug in apache for sure.
Also, are they all running the same apache, apache modules and all other libraries? any differences there?
This sounds like a bug in apache for sure.
Michael Shinn
Atomicorp - Security For Everyone
Atomicorp - Security For Everyone
No. I've not updated PHP beyond the CentosPlus version (5.1.6?).
I'll try to find time to do a diff, but in theory the configs are all almost identical, or at least they were until I started fiddling with the maxclients etc a while ago. Certainly the same modules are disabled.
Faris.
I'll try to find time to do a diff, but in theory the configs are all almost identical, or at least they were until I started fiddling with the maxclients etc a while ago. Certainly the same modules are disabled.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
- mikeshinn
- Atomicorp Staff - Site Admin
- Posts: 4155
- Joined: Thu Feb 07, 2008 7:49 pm
- Location: Chantilly, VA
For what its worth, I've definitely seen that changes to MaxClients and changes to the StartServers in the worker MPM cause segfaults on other systems we work with. I'll do some research.
Also you probably mentioned this before but are these 32bit or 64bit systems?
Also you probably mentioned this before but are these 32bit or 64bit systems?
Michael Shinn
Atomicorp - Security For Everyone
Atomicorp - Security For Everyone
Thanks mike.
These are, technically, 32-bit Centos 4 VPSes running on a 64-bit Centos 5 hardware node.
You are also welcome to play with one of the VPSes that's segfaulting if you like.
Faris.
These are, technically, 32-bit Centos 4 VPSes running on a 64-bit Centos 5 hardware node.
You are also welcome to play with one of the VPSes that's segfaulting if you like.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
- mikeshinn
- Atomicorp Staff - Site Admin
- Posts: 4155
- Joined: Thu Feb 07, 2008 7:49 pm
- Location: Chantilly, VA
Thanks I will definitely take you up on the offer to check things out locally. I've been reading up some issues with 64bit, php, apache and segfaults so I wonder if this is related to that.
I'm off for the holidays but will check the forums this weekend.
I'm off for the holidays but will check the forums this weekend.
Michael Shinn
Atomicorp - Security For Everyone
Atomicorp - Security For Everyone
I've PMed you access details.
Here's a diff of the httpd.confs.
107 does not segfault. 101 does segfault.
FrontPage is not the culprit, however.
Here's another diff, this time 107 (does not segfault) against 112 (does segfault)
Obviously there is a drastic difference in the basic MaxClient/ServerLimit etc stuff.
But I've tried keepalive on and off, and that makes no difference.
I've tried reducing and increaseing maxrequestsperchild and that made no difference.
Faris.
Here's a diff of the httpd.confs.
107 does not segfault. 101 does segfault.
Code: Select all
[root@vz diffs]# diff httpd107.conf httpd101.conf
62a63
> ########CoreDumpDirectory /core
100,105c101,106
< StartServers 1
< MinSpareServers 1
< MaxSpareServers 5
< ServerLimit 10
< MaxClients 10
< MaxRequestsPerChild 4000
---
> StartServers 4
> MinSpareServers 4
> MaxSpareServers 8
> ServerLimit 150
> MaxClients 150
> MaxRequestsPerChild 1000
117c118
< MaxClients 10
---
> MaxClients 150
157c158
< LoadModule mime_magic_module modules/mod_mime_magic.so
---
> #LoadModule mime_magic_module modules/mod_mime_magic.so
191c192,193
< LoadFile /usr/lib/libxml2.so
---
> #LoadModule jk_module /usr/lib/httpd/modules/mod_jk.so
> #LoadFile /usr/lib/libxml2.so
193a196
> LoadModule frontpage_module /usr/lib/httpd/modules/mod_frontpage.so
FrontPage is not the culprit, however.
Here's another diff, this time 107 (does not segfault) against 112 (does segfault)
Code: Select all
[root@vz diffs]# diff httpd107.conf httpd112.conf
73c73
< KeepAlive Off
---
> KeepAlive On
100,105c100,105
< StartServers 1
< MinSpareServers 1
< MaxSpareServers 5
< ServerLimit 10
< MaxClients 10
< MaxRequestsPerChild 4000
---
> StartServers 8
> MinSpareServers 8
> MaxSpareServers 16
> ServerLimit 250
> MaxClients 250
> MaxRequestsPerChild 3000
117c117
< MaxClients 10
---
> MaxClients 100
191c191
< LoadFile /usr/lib/libxml2.so
---
> #LoadFile /usr/lib/libxml2.so
But I've tried keepalive on and off, and that makes no difference.
I've tried reducing and increaseing maxrequestsperchild and that made no difference.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
wah! Looks like the latest two sets of rules have taken apache over the memory edge again and I'm getting random segfaults every now and then no matter what (previously this would not have happened, at least with the domain-blacklist disabled).
I think I'll have to cut down the malware and jitp rules a bit.
Faris.
I think I'll have to cut down the malware and jitp rules a bit.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
- mikeshinn
- Atomicorp Staff - Site Admin
- Posts: 4155
- Joined: Thu Feb 07, 2008 7:49 pm
- Location: Chantilly, VA
The blacklists for spam are a backup to the spam rules, so if spam seems to be getting stopped by the 30_asl_antispam rules then turn off the domain-blacklists if you are really tight on RAM. We're working on integration with spamassassin for posts which will make the rules less important, but its gonna take time to work out the integration with the right third party applications (Wordpress, Tikis, Wikis, Joomla/Mambo, etc.). Basically we have to limit it to just looking at posts for externally facing posts - and not every post otherwise we might block internal stuff (like IT helpdesks reporting spam) or slow down the box by looking at non-spammy stuff.
Web spam is a huge problem for everyone so we're always working on ways to both stop it and to improve the process. Right now that means more RAM, but once we can do more with spamassassin that will make it less necessary for you to use the blacklists.
The blacklists are sort of a last line of defense - know bad domains that you should always block, but there are other rules that block those as well so you can live without them. The same for JITP patches - as you move down the file the rules get newer, so if you need to cut down rules you can eliminate the older stuff at the top.
In the future we will make it so you can set a window for JITP. Only run patches 30 days old, 90 days, etc. to map to your patching cycle so they will auto-prune or "No more than 100 Just in Time Patches" or both (no more than 30 days and no more than 100, whichever is lower).
Web spam is a huge problem for everyone so we're always working on ways to both stop it and to improve the process. Right now that means more RAM, but once we can do more with spamassassin that will make it less necessary for you to use the blacklists.
The blacklists are sort of a last line of defense - know bad domains that you should always block, but there are other rules that block those as well so you can live without them. The same for JITP patches - as you move down the file the rules get newer, so if you need to cut down rules you can eliminate the older stuff at the top.
In the future we will make it so you can set a window for JITP. Only run patches 30 days old, 90 days, etc. to map to your patching cycle so they will auto-prune or "No more than 100 Just in Time Patches" or both (no more than 30 days and no more than 100, whichever is lower).
Michael Shinn
Atomicorp - Security For Everyone
Atomicorp - Security For Everyone
Thanks Mike,
Yes, the anti-spam rules are extremely useful.
I think I'll prune the jitp before anything else. There are loads of applications in there that aren't used by any customers.
The problem is that I'll need to do that every time there's an update, so the "auto-expire" feature you mention will be helpful.
Faris.
Yes, the anti-spam rules are extremely useful.
I think I'll prune the jitp before anything else. There are loads of applications in there that aren't used by any customers.
The problem is that I'll need to do that every time there's an update, so the "auto-expire" feature you mention will be helpful.
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
I'm going to post this here rather than opening a new topic or a support case.....
I've not updated to new rules since Thursday/Friday.
On Saturday I updated ASL to 2.0.6 and updated the rules at the same time. As usual I disabled the domain-blacklist.
However, what is very unusual is the amount of memory that's being used now. It has dropped hugely, in addition to which I've not seen even one segfault after restarting apache following the new ruleset install (which is unusual).
mod_sec is still working and all is well as far as I can tell.
I know what the changelog/announce says about 2.0.6 -- there's nothing in there that should have caused any change on this front.
But could there be? Was anything else changed? What about some individual rule in the ruleset that might have been modified -- something particularly complex maybe?
Faris.
I've not updated to new rules since Thursday/Friday.
On Saturday I updated ASL to 2.0.6 and updated the rules at the same time. As usual I disabled the domain-blacklist.
However, what is very unusual is the amount of memory that's being used now. It has dropped hugely, in addition to which I've not seen even one segfault after restarting apache following the new ruleset install (which is unusual).
mod_sec is still working and all is well as far as I can tell.
I know what the changelog/announce says about 2.0.6 -- there's nothing in there that should have caused any change on this front.
But could there be? Was anything else changed? What about some individual rule in the ruleset that might have been modified -- something particularly complex maybe?
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
And the mystery deepens!
I decided to enable the full ruleset on all our systems, including the one that would normally segfault within seconds of doing so.
Nothing. No segfaults. Wierd or what?
BUT I did see some of these in /var/log/httpd/error_log, which I've never seen before:
and
I've never ever seen these errors before.
It is all webmail again as usual.
So I've disabled the domain-blacklist.txt list again
I'm still seeing these though:
They are generally been a precursor to a spate of segfaults in the past. So far nothing though.
Faris.
I decided to enable the full ruleset on all our systems, including the one that would normally segfault within seconds of doing so.
Nothing. No segfaults. Wierd or what?
BUT I did see some of these in /var/log/httpd/error_log, which I've never seen before:
Code: Select all
ModSecurity: Error reading request body: Connection reset by peer [hostname "webmail.some-domain-on-our-server.co.uk"] [uri "/horde/imp/compose.php"]
Code: Select all
ModSecurity: Multipart parsing error: Multipart: Final boundary missing. [hostname "webmail.some-domain-on-our-server.co.uk"]
It is all webmail again as usual.
So I've disabled the domain-blacklist.txt list again
I'm still seeing these though:
Code: Select all
*** glibc detected *** free(): invalid next size (normal): 0xbeb4d8a0 ***
[Wed Jan 21 08:35:24 2009] [notice] child pid 7292 exit signal Aborted (6)
Faris.
--------------------------------
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
<advert>
If you want to rent a UK-based VPS that comes with friendly advice and support from a fellow ART fan, please get in touch.
</advert>
- mikeshinn
- Atomicorp Staff - Site Admin
- Posts: 4155
- Joined: Thu Feb 07, 2008 7:49 pm
- Location: Chantilly, VA
You can download a copy of the first version of a tool that will autostart apache when it sees processing failures:
http://downloads.prometheus-group.com/t ... _apache.pl
If you have other errors you want it to look for, let me know and I'll add it in.
http://downloads.prometheus-group.com/t ... _apache.pl
If you have other errors you want it to look for, let me know and I'll add it in.
Michael Shinn
Atomicorp - Security For Everyone
Atomicorp - Security For Everyone