Server gets overloaded

Community support for Plesk, CPanel, WebMin and others with insight from two of the founders of Plesk. Ask for help here! No question is too simple or complicated. :-)
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Server gets overloaded

Unread post by biggles »

I really need your help. Since a few weeks my server seems to be overloaded. It manifests by stopping to respond to services (mail, httpd etc). When trying to login to ssh I get a greeting and it's possible to insert username and ssh key password, but then it stops responding after preseting the ssh greeting.When logging in to the console I can see some kernel messages and a login prompt. The part of the kernel message I can read says something about httpd. I'm not able to login. I can send "restart guest" to the server (it's a vmware vps) which makes the server switch to runlevel 6 and complete the reboot process. The reboot process takes quite a long time, especially at "stopping httpd service"

My question to you is where to start looking for errors. Which logs to read? Is there some kind of kernel dump I can access after restart?

I have also noted some clamav failiures during the last week. They start like qmail-scannar taking foreever to scan an e-mail which results in no mail being delivered. Often there is some bigger email in the queue, but not an enourmous one (just a few MBs). Can it be related?

The server has four processor cores assigned to it and 4 GB RAM. Running latest ASL kernel on CentOS 5.6 (32-bit PAE). The processor seems more or less idle when it hangs.
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

This looks like the first error in messages:

Code: Select all

May 29 15:47:36 server7 kernel: INFO: task mysqld:20852 blocked for more than 120 seconds.
May 29 15:47:36 server7 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 29 15:47:36 server7 kernel: mysqld        D 000157c3     0 20852   3273 0x00000080
May 29 15:47:36 server7 kernel:  edc01cb8 00200082 dc197020 000157c3 edc01c5c c045e60e 00200292 f6afc600
May 29 15:47:36 server7 kernel:  00006160 00006160 00006160 00000002 ca8d2f40 ca8d31e4 c3b86160 00000003
May 29 15:47:36 server7 kernel:  dc251602 000157c3 edc01c88 f9111646 edc01ca0 f91116dd ca8d31e4 f72a3084
May 29 15:47:36 server7 kernel: Call Trace:
May 29 15:47:36 server7 kernel:  [<c045e60e>] ? wake_up_bit+0x61/0x66
May 29 15:47:36 server7 kernel:  [<f9111646>] ? call_xp_up+0x1e/0x20 [dazuko]
May 29 15:47:36 server7 kernel:  [<f91116dd>] ? dazuko_change_slot_state+0x3e/0x46 [dazuko]
May 29 15:47:36 server7 kernel:  [<f91111b1>] ? call_xp_atomic_read+0x21/0x23 [dazuko]
May 29 15:47:36 server7 kernel:  [<f9111733>] ? get_ready_slot_condition+0x4e/0x68 [dazuko]
May 29 15:47:36 server7 kernel:  [<f91116e5>] ? get_ready_slot_condition+0x0/0x68 [dazuko]
May 29 15:47:36 server7 kernel:  [<f9115269>] xp_wait_until_condition+0x109/0x11f [dazuko]
May 29 15:47:36 server7 kernel:  [<c045e613>] ? autoremove_wake_function+0x0/0x38
May 29 15:47:36 server7 kernel:  [<f91118f8>] call_xp_wait_until_condition+0x2e/0x31 [dazuko]
May 29 15:47:36 server7 kernel:  [<f9111956>] dazuko_run_daemon_on_slotlist+0x38/0x169 [dazuko]
May 29 15:47:36 server7 kernel:  [<f9114c5c>] ? xp_free+0xd/0x11 [dazuko]
May 29 15:47:36 server7 kernel:  [<f9114c6d>] ? xp_id_free+0xd/0x11 [dazuko]
May 29 15:47:36 server7 kernel:  [<f9111ba3>] dazuko_process_access+0x11c/0x183 [dazuko]
May 29 15:47:36 server7 kernel:  [<f9114dfe>] dazuko_sys_generic+0xfc/0x13f [dazuko]
May 29 15:47:36 server7 kernel:  [<f9114e6c>] dazukoflt_open+0x2b/0x45 [dazuko]
May 29 15:47:36 server7 kernel:  [<c0578521>] ? journal_stop+0x223/0x22e
May 29 15:47:36 server7 kernel:  [<f90f1773>] ? rfs_inode_set_rinfo_fast+0x46/0x5d [redirfs]
May 29 15:47:36 server7 kernel:  [<f90f5025>] rfs_precall_flts+0x4d/0x994 [redirfs]
May 29 15:47:36 server7 kernel:  [<f90f07bd>] rfs_open+0x15b/0x2e7 [redirfs]
May 29 15:47:36 server7 kernel:  [<c04e4caa>] __dentry_open+0x17b/0x27a
May 29 15:47:36 server7 kernel:  [<c04e4e4d>] nameidata_to_filp+0x31/0x48
May 29 15:47:36 server7 kernel:  [<f90f0662>] ? rfs_open+0x0/0x2e7 [redirfs]
May 29 15:47:36 server7 kernel:  [<c04f3a31>] do_filp_open+0x338/0x883
May 29 15:47:36 server7 kernel:  [<c05dad85>] ? gr_learn_resource+0x40/0x170
May 29 15:47:36 server7 kernel:  [<c04e4a29>] do_sys_open+0x55/0xed
May 29 15:47:36 server7 kernel:  [<c04e4b0d>] sys_open+0x23/0x2b
May 29 15:47:36 server7 kernel:  [<c040931c>] syscall_call+0x7/0xb
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

Which gets repeated a few more times during the next ten minutes: Then I get:

Code: Select all

May 29 15:57:36 server7 kernel: INFO: task asl-shun.pl:29312 blocked for more than 120 seconds.
May 29 15:57:36 server7 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 29 15:57:36 server7 kernel: asl-shun.pl   D 00015807     0 29312  29311 0x00000080
May 29 15:57:36 server7 kernel:  e3d21cb8 00200086 048f09c9 00015807 e3d21c5c c045e60e 00200292 f6afc600
May 29 15:57:36 server7 kernel:  00006160 00006160 00006160 000031b0 f4e90000 f4e902a4 c3a86160 00000001
May 29 15:57:36 server7 kernel:  e3d21c80 f911514f e3d21c88 f9111646 e3d21ca0 c3a831b0 f4e902a4 05a1d60a
May 29 15:57:36 server7 kernel: Call Trace:
May 29 15:57:36 server7 kernel:  [<c045e60e>] ? wake_up_bit+0x61/0x66
May 29 15:57:36 server7 kernel:  [<f911514f>] ? xp_up+0xd/0xf [dazuko]
May 29 15:57:36 server7 kernel:  [<f9111646>] ? call_xp_up+0x1e/0x20 [dazuko]
May 29 15:57:36 server7 kernel:  [<f91116e5>] ? get_ready_slot_condition+0x0/0x68 [dazuko]
May 29 15:57:36 server7 kernel:  [<f9115269>] xp_wait_until_condition+0x109/0x11f [dazuko]
May 29 15:57:36 server7 kernel:  [<c045e613>] ? autoremove_wake_function+0x0/0x38
May 29 15:57:36 server7 kernel:  [<f91118f8>] call_xp_wait_until_condition+0x2e/0x31 [dazuko]
May 29 15:57:36 server7 kernel:  [<f9111956>] dazuko_run_daemon_on_slotlist+0x38/0x169 [dazuko]
May 29 15:57:36 server7 kernel:  [<f9114c5c>] ? xp_free+0xd/0x11 [dazuko]
May 29 15:57:36 server7 kernel:  [<f9114c6d>] ? xp_id_free+0xd/0x11 [dazuko]
May 29 15:57:36 server7 kernel:  [<f9111ba3>] dazuko_process_access+0x11c/0x183 [dazuko]
May 29 15:57:36 server7 kernel:  [<f9114dfe>] dazuko_sys_generic+0xfc/0x13f [dazuko]
May 29 15:57:36 server7 kernel:  [<f9114e6c>] dazukoflt_open+0x2b/0x45 [dazuko]
May 29 15:57:36 server7 kernel:  [<c04dcfa6>] ? kmem_cache_alloc+0x80/0xfc
May 29 15:57:36 server7 kernel:  [<f90f2366>] ? rfs_inode_add+0x31/0x1b8 [redirfs]
May 29 15:57:36 server7 kernel:  [<c0578521>] ? journal_stop+0x223/0x22e
May 29 15:57:36 server7 kernel:  [<f90f1773>] ? rfs_inode_set_rinfo_fast+0x46/0x5d [redirfs]
May 29 15:57:36 server7 kernel:  [<f90f5025>] rfs_precall_flts+0x4d/0x994 [redirfs]
May 29 15:57:36 server7 kernel:  [<f90f07bd>] rfs_open+0x15b/0x2e7 [redirfs]
May 29 15:57:36 server7 kernel:  [<c04e4caa>] __dentry_open+0x17b/0x27a
May 29 15:57:36 server7 kernel:  [<c04e4e4d>] nameidata_to_filp+0x31/0x48
May 29 15:57:36 server7 kernel:  [<f90f0662>] ? rfs_open+0x0/0x2e7 [redirfs]
May 29 15:57:36 server7 kernel:  [<c04f3a31>] do_filp_open+0x338/0x883
May 29 15:57:36 server7 kernel:  [<c05dad85>] ? gr_learn_resource+0x40/0x170
May 29 15:57:36 server7 kernel:  [<c04e4a29>] do_sys_open+0x55/0xed
May 29 15:57:36 server7 kernel:  [<c04e4b0d>] sys_open+0x23/0x2b
May 29 15:57:36 server7 kernel:  [<c040931c>] syscall_call+0x7/0xb
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

And this seems to happen when the server stops responding:

Code: Select all

May 29 18:31:05 server7 kernel: named invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
May 29 18:31:05 server7 kernel: named cpuset=/ mems_allowed=0
May 29 18:31:05 server7 kernel: Pid: 3225, comm: named Not tainted 2.6.32.28-1.art.i686.PAE #1
May 29 18:31:05 server7 kernel: Call Trace:
May 29 18:31:05 server7 kernel:  [<c04b1a9c>] oom_kill_process+0x68/0x24c
May 29 18:31:05 server7 kernel:  [<c04b20c1>] __out_of_memory+0x7e/0x8d
May 29 18:31:05 server7 kernel:  [<c04b2133>] out_of_memory+0x63/0x93
May 29 18:31:05 server7 kernel:  [<c04b4bb5>] __alloc_pages_nodemask+0x3d9/0x4b6
May 29 18:31:05 server7 kernel:  [<c04b60e8>] __do_page_cache_readahead+0xa5/0x15e
May 29 18:31:05 server7 kernel:  [<c04b61bd>] ra_submit+0x1c/0x21
May 29 18:31:05 server7 kernel:  [<c04b026d>] filemap_fault+0x173/0x2fd
May 29 18:31:05 server7 kernel:  [<c04c43b5>] __do_fault+0x4c/0x453
May 29 18:31:05 server7 kernel:  [<c04c68fe>] handle_mm_fault+0x731/0xfc2
May 29 18:31:05 server7 kernel:  [<c04e7136>] ? do_sync_write+0xb0/0xee
May 29 18:31:06 server7 kernel:  [<c045e613>] ? autoremove_wake_function+0x0/0x38
May 29 18:31:06 server7 kernel:  [<c04e6d1d>] ? fsnotify_modify+0x62/0x6d
May 29 18:31:06 server7 kernel:  [<c07e6cbd>] ? do_page_fault+0x0/0x5d2
May 29 18:31:06 server7 kernel:  [<c07e7279>] do_page_fault+0x5bc/0x5d2
May 29 18:31:06 server7 kernel:  [<c04f129f>] ? path_put+0x1a/0x1d
May 29 18:31:06 server7 kernel:  [<c0429efe>] ? smp_invalidate_interrupt+0x7b/0x8d
May 29 18:31:06 server7 kernel:  [<c07e6cbd>] ? do_page_fault+0x0/0x5d2
May 29 18:31:06 server7 kernel:  [<c07e53e6>] error_code+0x76/0x80
May 29 18:31:06 server7 kernel: Mem-Info:
May 29 18:31:06 server7 kernel: DMA per-cpu:
May 29 18:31:06 server7 kernel: CPU    0: hi:    0, btch:   1 usd:   0
May 29 18:31:06 server7 kernel: CPU    1: hi:    0, btch:   1 usd:   0
May 29 18:31:06 server7 kernel: CPU    2: hi:    0, btch:   1 usd:   0
May 29 18:31:06 server7 kernel: CPU    3: hi:    0, btch:   1 usd:   0
May 29 18:31:06 server7 kernel: Normal per-cpu:
May 29 18:31:06 server7 kernel: CPU    0: hi:  186, btch:  31 usd:  23
May 29 18:31:06 server7 kernel: CPU    1: hi:  186, btch:  31 usd:  60
May 29 18:31:06 server7 kernel: CPU    2: hi:  186, btch:  31 usd:  30
May 29 18:31:06 server7 kernel: CPU    3: hi:  186, btch:  31 usd:  29
May 29 18:31:06 server7 kernel: HighMem per-cpu:
May 29 18:31:07 server7 kernel: CPU    0: hi:  186, btch:  31 usd:  21
May 29 18:31:07 server7 kernel: CPU    1: hi:  186, btch:  31 usd:  30
May 29 18:31:07 server7 kernel: CPU    2: hi:  186, btch:  31 usd:   0
May 29 18:31:07 server7 kernel: CPU    3: hi:  186, btch:  31 usd:  14
May 29 18:31:07 server7 kernel: active_anon:746798 inactive_anon:205833 isolated_anon:0
May 29 18:31:07 server7 kernel:  active_file:88 inactive_file:244 isolated_file:96
May 29 18:31:07 server7 kernel:  unevictable:1486 dirty:0 writeback:0 unstable:0
May 29 18:31:07 server7 kernel:  free:28336 slab_reclaimable:9847 slab_unreclaimable:7437
May 29 18:31:07 server7 kernel:  mapped:1731 shmem:627 pagetables:20794 bounce:0
May 29 18:31:07 server7 kernel: DMA free:7960kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_fil          e:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writ          eback:0kB mapped:0kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:12kB kernel_stack:0kB pagetables:0kB unstable:0k          B bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
May 29 18:31:07 server7 kernel: lowmem_reserve[]: 0 863 4040 4040
May 29 18:31:07 server7 kernel: Normal free:105468kB min:3724kB low:4652kB high:5584kB active_anon:318088kB inactive_ano          n:289464kB active_file:0kB inactive_file:152kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:883912kB ml          ocked:0kB dirty:0kB writeback:0kB mapped:36kB shmem:0kB slab_reclaimable:39336kB slab_unreclaimable:29736kB kernel_stack          :10160kB pagetables:12928kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
May 29 18:31:07 server7 kernel: lowmem_reserve[]: 0 0 25414 25414
May 29 18:31:07 server7 kernel: HighMem free:468kB min:512kB low:3936kB high:7364kB active_anon:2669104kB inactive_anon:          534124kB active_file:352kB inactive_file:616kB unevictable:5944kB isolated(anon):0kB isolated(file):384kB present:325310          8kB mlocked:5944kB dirty:0kB writeback:0kB mapped:6888kB shmem:2508kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel          _stack:0kB pagetables:70248kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no
May 29 18:31:07 server7 kernel: lowmem_reserve[]: 0 0 0 0
May 29 18:31:07 server7 kernel: DMA: 4*4kB 13*8kB 12*16kB 7*32kB 10*64kB 7*128kB 5*256kB 3*512kB 1*1024kB 1*2048kB 0*409          6kB = 7960kB
May 29 18:31:07 server7 kernel: Normal: 785*4kB 1507*8kB 966*16kB 599*32kB 377*64kB 144*128kB 41*256kB 3*512kB 1*1024kB           0*2048kB 0*4096kB = 105436kB
May 29 18:31:07 server7 kernel: HighMem: 57*4kB 30*8kB 5*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*          4096kB = 548kB
May 29 18:31:07 server7 kernel: 85932 total pagecache pages
May 29 18:31:07 server7 kernel: 83934 pages in swap cache
May 29 18:31:07 server7 kernel: Swap cache stats: add 534267, delete 450333, find 472747/474188
May 29 18:31:07 server7 kernel: Free swap  = 0kB
May 29 18:31:07 server7 kernel: Total swap = 2096472kB
May 29 18:31:07 server7 kernel: 1310704 pages RAM
May 29 18:31:07 server7 kernel: 1083906 pages HighMem
May 29 18:31:07 server7 kernel: 281194 pages reserved
May 29 18:31:08 server7 kernel: 2280770 pages shared
May 29 18:31:08 server7 kernel: 982925 pages non-shared
May 29 18:31:08 server7 kernel: Out of memory: kill process 3273 (mysqld_safe) score 376762 or a child
May 29 18:31:08 server7 kernel: Killed process 3398 (mysqld)
scott
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 8355
Joined: Wed Dec 31, 1969 8:00 pm
Location: earth
Contact:

Re: Server gets overloaded

Unread post by scott »

Yeah see that happening in processes all over the place? Its indicative of bad hardware (probably ram) or maybe even that its overheating.
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

Doesn't sound good. Don't see any temp alarms on the IPMI sensors and it is installed in an environment controlled server facility, soo I thinking more of defective RAM. Gah, I hate the process of finding out which module is failing...
User avatar
mikeshinn
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 4149
Joined: Thu Feb 07, 2008 7:49 pm
Location: Chantilly, VA

Re: Server gets overloaded

Unread post by mikeshinn »

You may also have something leaking memory badly, I see the oom_kill_process in one your messages and that would be the kernel trying to save the system by killing off processes that either are leaking too much memory, or are just using memory that system desperately needs. I'd setup some process monitoring software and see if you can catch what may be eating up memory too, and it may not be one thing it could be tons of little things that just tip the system over.

Get some data though first, you got a lot going on there and not enough information yet to know whats actually causing these.
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

Anyone got a tip for a good tool to monitor the processes that doesn't need to much configuration?
User avatar
mikeshinn
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 4149
Joined: Thu Feb 07, 2008 7:49 pm
Location: Chantilly, VA

Re: Server gets overloaded

Unread post by mikeshinn »

Well, sysstat will collect data and all you have to do is run yum install sysstat.

Depends on what you want interface wise beyond that. Any thoughts?
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

Nice! Already had it installed! Now I got sa-logs for the last seven days. Anything special I should look for?

BTW, thanks a lot for your help. Excellent support, as always!
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

I have done some monitoring with zabbix. See attached graph. The graph shows two times when the server has stated to behave strangely. At the first spike the server was completely hang. The second one I happened to log in when it happened and saw that clamd was a hung process (quite a lot of them actually. When I restarted the clamd process (/etc/init.d/clamd restart) the process dropped down to normal. This was the first time I was able to handle the error without rebooting the server. apache was still answering calls but mysql didn't, resulting in database connect errors on database driven web sites. When clamd was restarted everything started working again.

Since then I have turned of dazuko scanning and haven't had a single failure. Not enough time to rule anything out yet, but it looks like a suspect.

Today I had a few errors like this from qmail-scanner in the maillog. No errors in clamd.log, but it looks like it restarted at the same time.

Code: Select all

X-Qmail-Scanner-2.08st: [server7.domain.com 13074388367986868] clamdscan: corrupt or unknown clamd scanner error or memory/resource/perms problem - exit status 512/2
Anyone got any suggestions?
Attachments
Number of processes
Number of processes
zabbix.png (29.19 KiB) Viewed 12203 times
User avatar
mikeshinn
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 4149
Joined: Thu Feb 07, 2008 7:49 pm
Location: Chantilly, VA

Re: Server gets overloaded

Unread post by mikeshinn »

Hmmmm... is there a process that kicks off around that time (like a cronjob)? Or a big batch of email or something that could be overloading clamd?
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

Nope, not that I can see at least. It really hasn't happened on the same time of day, but you might be right. I do have an externala script that backs up he server (rsync-diff). It starts 23:15.
biggles
Forum Regular
Forum Regular
Posts: 806
Joined: Tue Jul 15, 2008 2:38 pm
Location: Sweden
Contact:

Re: Server gets overloaded

Unread post by biggles »

Now more then 10 days without a hiccup after disabling dazuko. I am pretty confident the problem is caused by some kind of race condition where dazuko is involved. Anyone got any suggestions how to search for the underlying problem?
User avatar
mikeshinn
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 4149
Joined: Thu Feb 07, 2008 7:49 pm
Location: Chantilly, VA

Re: Server gets overloaded

Unread post by mikeshinn »

Hmmmm, it may not be a race condition, it might be a pipelining issue with so many files backed up for scanning. Do you know if clamd was busy when this occured? And I believe you said you have a backup script that runs around the time of the first spike in processes on your system, was dazuko watching directories that were being backed up?
Post Reply