Apache segmentation fault

Unread post by **mikeshinn** » Sun Mar 06, 2011 1:52 pm

So if I understand your problem, you have a case where ASL is correctly reporting that your OS vendor is reporting that you are missing updates. You install these updates, because you want to make sure your system is secure, and one of those updates from your OS vendor (or some other source) causes a problem on your system. You are unable to test these updates before you install them and are disappointed that there is some problem, and if I understand you correctly (please correct me if I misunderstood) you want to know before the fact that those updates will not break your system or at least to just not have to live with bugs, etc.

In that case, I have four solutions for you that will solve this problem handily.

Executive Summary

Keep in mind that the only way to have reasonable assurance that something will work is for someone to test that it works. So, for a priori assurance that an update or installation wont break anything requires a lot of testing. Someone must always do the testing, thats the only want to know. No software is perfect, nothing can be, people are flawed creatures, people write software and software is complex so you will *always* have bugs. Your solutions to this requirement are:

1) switch everything you use to commercially supported software, that has all been tested by individuals with the expertise to ensure that all the components will work work flaws. This is possible, and its basically transfering the responsibility of testing onto a third party. All reliability is accomplished via testing and QA activities. So, if you want a full package that did this you would want to switch your OS to something like Redhat Enterprise Linux, and switch all your web applications to ones that are certified for that platform (and also supported and tested for it), and make sure you never run anything else.

2) Test your changes before your implement them.

3) Pay someone else to test your changes before you implement them

4) Install the updates, watch for an issue and if you find one and roll back

Discussion

So, now I'll explain why these are your only choices. Here are the repositories you have configured on your system, and what roll that may play in your current situation:

/etc/yum.repos.d

-rw-r--r-- 1 root root 442 Jul 16 2010 asl.repo

Thats the ASL repo, commercial supported by us. That repo only contains ASL components, and not any operating system components (like PHP, apache, mysql, etc.). It won't change your OS, and these packages are tested and supported to run correctly on the base OS it is installed on.

-rw-r--r-- 1 root root 1337 Aug 19 2010 atomic.repo

That is our free and unsupported open source repository. It contains lots of software, which you should always test first. If you can not test this software, then you should not use this repository. Software from this repository could be the source of your issue, as this contains components that can change Apache, so these could be potential sources.

-rw-r--r-- 1 root root 2245 Apr 25 2010 CentOS-Base.repo

That is the free and unsupported Centos OS repository, it contains a lot of software which you should always test first. Its not commercially supported or tested for conflict (although they do a bang up job, so this isnt a ding on them, you get what you pay for, and its free). If you can not test this software, then you should not use this repository. Software from this repository could be the source of your issue, as this contains components that can change Apache and the entire Operating System, so these could be potential sources.

-rw-r--r-- 1 root root 2347 Mar 31 2009 CentOS-Base.repo.old

Looks like you have a disabled (older?) repo that is the same as above.

-rw-r--r-- 1 root root 626 Apr 25 2010 CentOS-Media.repo

That is the free and unsupported Centos OS repository, it contains a lot of software which you should always test first. Its not commercially supported or tested for conflict (although they do a bang up job, so this isnt a ding on them, you get what you pay for, and its free). If you can not test this software, then you should not use this repository. Software from this repository could be the source of your issue, its unlikely given whats in the repository but it could happen.

-rw-r--r-- 1 root root 227 Mar 31 2009 intergenia.repo

No idea, I cant even find that repo - the best I find is that they are part of the centos mirrors, in which case you may have conflicting/overlapping repos setup. If you dont know what this repo is, disable it. Software from this repository could be the source of your issue.

-rw-r--r-- 1 root root 250 Sep 22 06:02 plesk.repo

The Plesk RPM repo, commercially supported by Parallels. This also does not change Apache, so its also not likely the source of your issues.

Analysis

So, from what I can see you are using a free and unsupported operating system, that hasnt had robust testing (although its dine fine software by itself) and its certainly not been certified or tested with whatever else you are running. You are also using several other opensource repositories that are also not supported or tested for your software. So you have a lot of software you need to test somehow to know that its always going to work for you.

You also have a *lot* of web applications that are unknown to me (and maybe to you), that may also be untested for your environment. Since you have a segfault in Apache that is a very reasonable source, or it may be in PHP, Apache or some Apache module. None of which are commercially supported or tested versions for your environment and its applications.

You are also using two commercial products (ASL and Parallels). We certainly support the ASL components, we do not supply or support anything else. Parallels certainly supports PSA (and anything else you purchased from them), but they too do not support or supply anything else. Neither of those products is likely the cause of your segfaults, as the problem is in Apache and Apache is not supplied by either Parallels or us.

And your delima is that ASL is simply reporting a fact: that the repository maintainers are reporting that updates are available for their software. You don't want those updates to break your system, and you aren't using software that anyone has tested to ensure that it won't break your system.

So, as to whether those updates for their software will cause issues on your platform is something that is not knowable to ASL, and its not knowable to you without testing. So, what do to?

Solutions

Option A: Get a commercially supported OS and all other software

As you have stated you can't do any testing yourself, then that means you will want to only use software that someone else has tested and supports. So, you will need to use a commercially support operating system, (as you are using CentOS and say that you are happy with it, I recommend you purchase a license for Redhat Enterprise Linux, which CentOS is based off of, and move your sites to a commercially supported platform.) You will also want to only use commercially support web software, for example if you use phpBB, then you will want to use vBulletin, and so on and make sure all your software is commercially support and tested before the fact by someone qualified to do this.

Option B: Test your updates yourself.

Setup a simple test environment, as biggles said you could do this on your desktop. Just get a copy of vmware, copy over your server to your desktop, run your updates there and if it doesnt break anything you can try them on your production system. And yes, you may still have to rollback an OS update, you never know what could happen. Maybe you have some application that uses an older function that the OS vendor deprecated, and you stil need it.

Option C: Pay someone qualified to do the testing for you

Hire a qualified engineer or engineers to do this testing for you. This is how its done in the commercial and government sectors and offers you a lot of flexibility, it lets you run open source, free, unsupported software with the same reliability as commercial software. Again, its all about testing, so if you pay someone to test it for you, and they are qualified to do this, you can get the same quality you would get from commercial software.

Option D: Install and rollback if you run into issues

Use the golden rule: "What Changed?" If something was working, and you made a change (installing an OS update for example), then its logically to assume that was the cause. So, heres a cheap way of living without any test environment (a test environment is ideal, so this is field expedient, not the preferred or recommended approach):

1. Configure yum to save rollback information.

Add the line

tsflags=repackage

to /etc/yum.conf.

2. Configure command-line rpm to save rollback information:

Add the line:

%_repackage_all_erasures 1

to /etc/rpm/macros.

3) Install the update(s)

4) Watch your system

5) If they break something, check this log to see what changed and when it changed:

/var/log/yum.log

5) Back out the updates, which you can do with either rollback, or oldpackages

Method 1:

To rollback to a previous state, perform an rpm update with the --rollback option followed by a date/time specifier.

Examples:

rpm -Uhv --rollback '9:00 am'
rpm -Uhv --rollback '4 hours ago'
rpm -Uhv --rollback 'december 25'.

Method 2:

Use the "oldpackage" option to manually force a specific RPM:

rpm -Uvh --oldpackage foo-1-1.i386.rpm

Keep in mind this will only let you rollback what can be rolled back. Some OS updates are not reversible, for example, if you did an upgrade of mysql that changed your tables you would not be able to roll back this way.

Method 3:

Perform nightly backups of your system (you should do this anyway), and roll back to your last update. If you setup something like the rdiff-backup schema in this post:

https://www.atomicorp.com/forums/viewto ... kup#p15855

Or something else that allows for easy access to live copies of your backups (like backuppc), you can pick and choose what you want to restore (such as your DBs if you changed the schema). You can also use this in conjunction with rollbacks, its particularly helpful with web applications and things that are not managed by package management.

Conclusion

The bottom line is that someone has to test your changes and you have to have a plan to roll back. Anyone that tells you otherwise is mistaken and probably trying to sell you something.

So to recap:

Option A: Switch to all commercial software and transfer the responsibility of testing to someone else

Option B: Test your updates before you roll them out, and roll them back out if they dont work for you

Option C: Hire people to do this testing for you

Option D: Update your production system without testing, watch for issues, and rollback if you need to

I hope this helps!

breun · Unread post by **breun** » Sun Mar 06, 2011 7:15 pm

That's a mighty large post.

I was just wondering what your definition of 'changing Apache' is. You say 'the asl.repo does not change Apache', but ASL installs multiple Apache modules (mod_security and mod_evasive come to mind). How is that different from how repositories like Atomic or the CentOS repositories 'change Apache'?

Unread post by **mikeshinn** » Sun Mar 06, 2011 8:27 pm

I was just wondering what your definition of 'changing Apache' is. You say 'the asl.repo does not change Apache', but ASL installs multiple Apache modules (mod_security and mod_evasive come to mind). How is that different from how repositories like Atomic or the CentOS repositories 'change Apache'?

Valid point, post updated. The Atomic and Centos repositories contain Apache, installing updates from it can change Apache literally is what I meants. The ASL repo does not contain Apache, it can not change Apache, nor does Plesk but they both modify it and install other modules.

ASL only installs the following modules for Apache:

mod_security
mod_evasive
And if its missing, mod_uniqueid (from the OS repository).

Those can change the behaviour of Apache, and certainly any module could cause an error or potentially a segfault. But they don't change Apache, its a minor point I concede so for the purposes of this thread, lets assume that those modules could be at fault as well. Its resonable to evaluate all causes and determine the root of the problem.

Segfault Analysis

Given that the ASL repo installs some Apache modules listed above, that leads to the question "are those modules causing segfaults?" No evidence thus far exists to conclude they are the cause, and all evidence thus far has demonstrated that those modules were not the source of any reported segfaults.

I'm confident saying that because we have looked into every report of a segfault, over many years now, and in every case (including one just a few weeks ago on a C5.5 box, where it was a bug in Apaches APR) it was either:

1. an Apache Bug (APR in Apache prior to 2.1.17 has a bug that can cause segfaults randomly in some rare cases, very rare, but it happens)
2. PHP
3. a PHP opimizer (right on the heels of PHP)
4. a bad webapp written in PHP
5. a bad htaccess loop
6. Or a bug in Apache itself.

Not one single case of a segfault caused by either of those modules, and its not surprising because they are both rock solid and in very widespread use. But we do have lots and lots of examples of OS components that cause segfaults. However, lets try to get the bottom of this.

Solution

Step 1:

Check your yum logs:

/var/log/yum.log

Ask yourself "When did the segfaults start, and what changed then?"

This will give you some idea if a module, OS, library, etc. update may be at the cause. Rollback and see if the segfaults go away, if they do, then you have a pretty good idea of what caused them. Keep in mind that memory use changes are not causes, so if you have a webapp or something that uses a lot of memory and you get segfaults unfortunately the memory use isnt the cause. Its just correlational, more memory in use means more opportunities for the fault to occur. It is not the cause. So if you have more memory in use and you segfault, and less and you don't thats not the cause of the problem. You can rule that out.

Step 2:

So, if your rollback doesnt give you a root cause, then we need to determine:

1. That you dont have a bug in Apache itself. Setup your system to capture core files, install the debuginfo for Apache and its modules and do a backtrace, then you'll see the cause
2. That you dont have a bug in PHP, same advice applies.
3. Check your webapps, you'd be surprised what they can do to themselves.
4. Check your htaccess files to make sure you dont have a monster on your hands, a bad one can kill the server.

Please work with your OS vendor to rule out 1-4 first, so far its been one of those 4. If you have a really good feeling that either mod_sec or mod_ev are causing a segfault, we would be happy to look into it, afterall those are provided by us and we support them. If you have a case where you can rule out your OS, PHP, Apache, etc., please make sure you have a core file for the segfault and a backtrace. We can't do anything without a core file.

Add the following to your Apache configuration:

CoreDumpDirectory /tmp

Check that your system allows cores:

ulimit -c

If you see "0" that means they are disabled. You will need to set them to unlimited (keep in mind that if Apache is using more memory that you have room for our your disk you will eat up all your disk space, and you will get a core for EACH segfault, so dont leave this on for days, watch for cores, when you get a few turn off coredump support in Apache). To make core dumps unlimited, run this as root:

ulimit -c unlimited

Install the debuginfo rpm for httpd, without this your core will be useless:

yum install httpd-debuginfo

If your OS is missing debuginfo, file a bug report with them. Its nothing we can fix, although we do make debuginfo rpms available in the atomic repository for our Apache builds.

Restart Apache. Watch for core files in /tmp, when you get a few, remove CoreDumpDirectory /tmp from your apache config. Reset your core limits back to 0, and restart Apache.

Then generate a backtrace with this command:

gdb /path/to/httpd /path/to/core --batch --quiet -ex "thread apply all bt full" > backtrace.log

And send it to us. Please keep in mind that this has been looked into many many times before, so its unlikely the modules are the cause - and to that end we wouldnt be able to support a fix in Apache itself. We may have a version of Apache, or PHP, etc. in the Atomic channel that may solve the problem, but those are not supported (although we do work damn hard to make sure they are commercial grade, as we use them too). In every case the problem was either Apache, APR, PHP, a webapp or an htaccess run amuck. - and those later issues are definitely something we have no way of helping with except moral support and advice.

Summary

Unfortunately, there are lots of cases where APR, PHP, PHP opimizers/encoders, bad webapps and htaccess files caused a segfault. There have been no cases where either mod_sec or mod_ev caused a segfault.

With that said, I know for a fact that old versions of APR have a bug that will cause a segfault. We released Apache 2.2.17 many months ago in the testing channel for several customers running proxies that ran into this, so if you see segfaults start with Apache, check your optimization modules and PHP - and if that doesnt fix it, try upgrading to 2.2.17 (and make sure your system works with a newer apache, Plesk on C4 for example wont work with Apache 2.2, file a bug report with Parallels on that one its not a bug in Apache, its the configs Plesk uses on C4 are written for an older version of Apache).

All of these are known causes of segfaults that are fairly easy to remedy and test.

Unread post by **mikeshinn** » Sun Mar 06, 2011 9:01 pm

I believe that the issue started after a yum ioncube update but I cannot find any more information about the error from the logs.

What happens if you rollback to an earlier version of ioncube? A bad/broken version can definitely cause segfaults.

premierhosting · Tue Mar 08, 2011 12:23 am

Beautiful info, probably going to a wiki near you?

Question this generated for me: Since someone here recommended not using the Plesk GUI updater, does that indicate it's a good idea to use the Plesk repo and make those updates by yum? Just making sure the asl and atomic repos don't handle that stuff.

breun · Unread post by **breun** » Tue Mar 08, 2011 3:28 am

premierhosting wrote:Question this generated for me: Since someone here recommended not using the Plesk GUI updater, does that indicate it's a good idea to use the Plesk repo and make those updates by yum?

That's a matter of taste. I like to use Plesk's updater, since it also offers the micro-updates. Though I also hate micro-updates, because they are not packaged (which explains why they can't be distributed via a yum repository). I prefer the CLI updater (autoinstaller) myself, not the web GUI one.

Just making sure the asl and atomic repos don't handle that stuff.

The ASL and Atomic repositories don't contain Plesk, there is a separate Plesk yum repository.

webfeatus · Unread post by **webfeatus** » Tue Mar 08, 2011 11:36 am

BTW, thank you for this excellent explanation and the input from other members also.

I am still working my way through all the suggestions...

premierhosting · Thu Mar 10, 2011 3:59 pm

breun wrote:I prefer the CLI updater (autoinstaller) myself, not the web GUI one.

/usr/local/psa/admin/bin/autoinstaller

?

breun · Unread post by **breun** » Sat Mar 12, 2011 4:02 am

premierhosting wrote:
breun wrote:I prefer the CLI updater (autoinstaller) myself, not the web GUI one.
/usr/local/psa/admin/bin/autoinstaller

?

Yes.

premierhosting · Sat Mar 12, 2011 7:23 pm

oooh, I like this!

Thank you!

webfeatus · Unread post by **webfeatus** » Tue Mar 15, 2011 8:20 pm

mikeshinn wrote:
I believe that the issue started after a yum ioncube update but I cannot find any more information about the error from the logs.
What happens if you rollback to an earlier version of ioncube? A bad/broken version can definitely cause segfaults.

Could these 2 instances, seen reported during asl -s, indicate that I have an issue with ioncube? (which may be causing the segfault)

Code: Select all

 Checking executable stack flag on PHP extensions
  /usr/lib/php/ioncube/ioncube_loader_lin_5.2.so :  [OK]
  /usr/lib/php/ioncube/ioncube_loader_lin_5.2.so :  [OK]
  /usr/lib/php/zend/ZendOptimizer-5.2.so :  [OK]

Unread post by **scott** » Wed Mar 16, 2011 8:56 am

Maybe, you definitely have something misconfigured if you're trying to invoke ioncube twice.

Atomicorp

Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault

Re: Apache segmentation fault