Page 1 of 2
Core Dump Error Message
Posted: Sat Oct 18, 2014 1:37 am
by KrazyBob
I am really out of my league these days. My newest Plesk Virtuozzo 4.7 server has just started spitting out this error message and I cannot find any reference to it in Google. It helps to know "what" to type. I am unable to get a backup off of the server because the load goes very, very high. It backups up less than 10G of 72G and starts over. I cannot even use vzmigrate to move the container without a high load. Virtuozzo does not support the --lazy flag anymore. Without a recent backup I am terrified to reboot the server. The error message tells me that it is trying to abort from a failed command and suggests a kernel issue. But I am wondering if a reboot is safe without a current backup.
Suggestions?
Code: Select all
[26138940.488268] Core dump to |/usr/libexec/abrt-hook-ccpp 11 0 20089 0 0 1413607761 e pipe failed
Re: Core Dump Error Message
Posted: Sat Oct 18, 2014 10:02 am
by mikeshinn
I can only speak to the abrt message, that means some application died and core dumped and sent it to the redhat "abrt" daemon which collects debug information for redhat.
Re: Core Dump Error Message
Posted: Sat Oct 18, 2014 10:32 am
by faris
I used to see this all over the place in the logs on our VZ machines. You can stop this by stopping the abrtd service (and at least one other service starting with abrt if I recall correctly).
In *my case* abrtd was complaining because it didn't have enough room in, I think, /boot to save the core dump. You can change the location but I never bothered.
And the core dumps were, I think, a result of a running program being killed by VZ due to OOM within a container - at least in my case that's what was happening. So it wasn't VZ itself erroring/failing, but something running in a container. It could equally be something running on the HN itself, or in the service container mind you (e.g. the backup!)
Check for OOMs within container that happen at the same time as these to confirm this. They should be visible in /var/log/messages in the HN.
Please note the usual "I think"s and "in my case"s here!!!
I don't want to be responsible for causing you more stress or problems.
Re: Core Dump Error Message
Posted: Sat Oct 18, 2014 2:41 pm
by KrazyBob
You guys are great! Funny, too, Faris. LOL.
My big issue is that I cannot run a backup. I try and the load skyrockets to, in last nights case, 88. I found the offending container and noted that the customer has moved a lot of CGI and Python scripts onto a Plesk 11 server. But I think that before I can reboot the server I MUST make backups and it won't let me. Sadly I noted this problem when I saw that a good backup hasn't run since July. I never got an error email and discovered that Outlook/Norton had flagged the email as SPAM. Dammit!
I have tried the CLI using:
Code: Select all
vzbackup -I -Cg --no-split 192.168.220.106 -e 113
but all I get is a high load.
Next I tried using the vzback.conf file and:
Code: Select all
vzbackup -I -p --no-split 192.168.220.106 -e 113
to force the max_load_avg to be followed but I end up with a zero byte backup that doesn't complete.
Lastly I tried just using:
even though it doesn't allow me, from what I have read, to specify --nosplit.
Bottom line: no backups!
Next recourse is to use vzmigrate and move the containers to another server without shutting them off but VZ 4.7 doesn't allow the --lazy flag:
Code: Select all
vzmigrate -r no --online --lazy --keep-dst 192.168.220.104 113
[code]
If I drop off the --lazy I again get a high load. I suspect that the server needs to be rebooted but once again without a backup I'm hosed.
So I guess now is the time to ask what some alternate command lines might work. In the past I have always used just a simple command line but as I stated it stopped working and I didn't notice w/o an error email.
Re: Core Dump Error Message
Posted: Sun Oct 19, 2014 6:38 pm
by mikeshinn
I try and the load skyrockets to, in last nights case, 88
Is the load from I/O or something else?
Re: Core Dump Error Message
Posted: Sun Oct 19, 2014 10:53 pm
by KrazyBob
I don't know or how to find out. I can say that it won't let me backup the entire server using vzbackup, or migrate it using vzmigrate and the flags I am setting.
What vzmigrate flags do you recommend?
Re: Core Dump Error Message
Posted: Mon Oct 20, 2014 4:57 am
by faris
how about shutting the container down and trying the backup again?
iotop is a great tool for checking what's eating your I/O, as it htop
I think both are in the rpmforge repo and work fine with VZ.
iotop is the simplest:
# iotop -oaP
o = only running stuff
a = accumulative (press A to toggle accumulative on and off. when on, it keeps the processes with the most accumulated disk io at the top)
P = show processes not threads
Things to look out for: syslog/rsyslog using a lot of IO. Nothing much you can do about it usually, but it is a bit of an indication that disk io isn't brilliant
You can also use ionice to reduce the IO priority of a process (e.g. backup) to help prevent the high load if it is to do with IO.
ionice -n [0-7] -p [PID]
See
http://www.cyberciti.biz/tips/linux-set ... ority.html
Also look at vzstat which can be very useful.
I don't use vzbackup anymore but I never did anything different about it other than increasing the timeout somewhere in one conf file or other.
Re: Core Dump Error Message
Posted: Mon Oct 20, 2014 4:59 am
by faris
oh, hang on. you said VZ4.7? Try using vzabackup instead of vzbackup to see if it makes any difference.
Re: Core Dump Error Message
Posted: Mon Oct 20, 2014 5:16 am
by KrazyBob
You are an excellent resource! I tried using vzabackup but it doesn't like the --nosplit flag. Do you allow multi-part backups, which I don't? I prefer one backup file. I use this:
Code: Select all
vzabackup -I -p --no-split 192.168.220.106 -e 113
but again it doesn't like the nosplit and the load still goes upwards to ~100. I am pretty sure that there is a failed app and I need to do a reboot. But I am afraid to do so without a current backup.
Using vzabackup required me to login even though I have a key set and vzbackup doesn't require me to login.
As soon as I ran vzabackup the load shot to 70 and I don't know how I aborted it.
Containers are being served. Would you consider rebooting the server without a current backup?
What is a sample command line for vzmigrate, please? It doesn't have a --lazy option that I can nurse along.
Re: Core Dump Error Message
Posted: Mon Oct 20, 2014 12:27 pm
by faris
Yeah, vzabackup doesn't have a split option.
I think you can just do something like this:
vzmigrate -r no root@destination-node CTID
-r no means keep a copy of the container on the source node. Makes moving it back really quick.
The main options is:
--online (which reduces downtime, and doesn't reboot the CT - user doesn't know anything happened. Doesn't always work, and requires certain modules to be present on both nodes to work, which aren't installed as standard)
--dry-run (does what it says on the tin)
For vzabackup from one server to another, you'd typically do something like (I think the syntax is correct, otherwise double-check with man vzabackup)
vzabackup -F root@source-node --storage root@storage-node -e CTID
Otherwise omit --storage root@storage-node
Also you can use --force to ignore errors.
And yes, *I* would reboot without the backup (but this doesn't mean you should!).
I'd try restarting the container first. If it restarts then you know it will reboot on HN boot unless you have a disk failure. If it doesn't restart then you know a backup would not do you any good and there's some sort of damage in the container that needs addressing anyway. That's my logic anyway.
Incidentally, have you tried Cloning the container? That would effectively be a backup. Just make sure it isn't set to run at boot or after Clone creation.
Re: Core Dump Error Message
Posted: Mon Oct 20, 2014 12:35 pm
by KrazyBob
I am most worried that there is a disk problem, although dmesg is clear except for the abrt-hook lines.
Any attempt to backup the node gives me a high CPU. VSTAT tells me that I am out of swap space but I don't know why. The sever is intended for two VZ containers using dual AMD 248 Opteron's and 4GB of RAM. This has worked for years. But it looks like the client has moved a lot of web sites onto one of the nodes that I didn't know about. I am terrified to do a reboot without a current backup.
Re: Core Dump Error Message
Posted: Tue Oct 21, 2014 8:59 am
by faris
well, do a backup with the container shut down. If that still doesn't work.....try excluding /var/named from the backup (there are some circular references in there or something)
Re: Core Dump Error Message
Posted: Tue Oct 21, 2014 2:05 pm
by KrazyBob
Will do, and thank you for taking the time to reply. I don't mind reading the manuals (although Plesk has one for turning rack mount screws - LOL) but I will. I don't mind getting dirty. But I will. It is just when I am stumped I ask for help.
I have been watching VZSTAT for a couple of days and I've noticed that Plesk 11 and 12 seem to setup PHP to run as a CGI or Python resulting in heavy loads. Mailman is one with its gate_news cron.
These are two small Plesk 11 servers that had cron's setup for backing them up. It appears that my SPAM filter was snagging the results email telling me that the backup had failed.
Code: Select all
10:31am, up 306 days, 1:31, 1 user, load average: 0.64, 0.58, 0.69
CTNum 3, procs 504: R 4, S 498, D 0, Z 2, T 0, X 0
CPU [ OK ]: CTs 97%, CT0 0%, user 39%, sys 12%, idle 49%, lat(ms) 9/0
Mem [ OK ]: total 3820MB, free 173MB, lat(ms) 1/0
ZONE0 (DMA): size 14MB, act 4MB, inact 2MB, free 7MB (0/0/0)
ZONE1 (DMA32): size 2992MB, act 947MB, inact 695MB, free 163MB (5/7/8)
ZONE2 (Normal): size 1008MB, act 229MB, inact 40MB, free 2MB (1/2/2)
Mem lat (ms): A0 0, K0 0, U0 1, K1 0, U1 0
Slab pages: 161MB/161MB (ino 59MB, de 0MB, bh 10MB, pb 0MB)
Swap [ OK ]: tot 12287MB, free 11464MB, in 0.000MB/s, out 0.000MB/s
Net [ OK ]: tot: in 0.004MB/s 45pkt/s, out 0.033MB/s 46pkt/s
lo: in 0.000MB/s 0pkt/s, out 0.000MB/s 0pkt/s
eth0: in 0.004MB/s 45pkt/s, out 0.033MB/s 46pkt/s
eth1: in 0.000MB/s 0pkt/s, out 0.000MB/s 0pkt/s
Disks [ OK ]: in 0.023MB/s, out 0.000MB/s
CTID ST %VM %KM PROC CPU SOCK FCNT MLAT IP
1 OK 0.0/6.7 0.0/0.3 0/3/240 0.00/33 20/720 0 0 xx.xx.xx
113 OK 23/54 1.4/MAX 0/154/MAX 0.21/33 303/MAX 0 xx.xx.xx
206 OK 27/54 1.1/MAX 0/103/MAX 48.7/33 207/MAX 0 xx.xx.xx
[root@hw006 vz-scripts]# cat 113.conf
#This is an example configuration file for so-called "basic" Container.
#<agent>: Configuration file for allocating 256 Mb of memory.
#
# Copyright (C) Parallels, 1999-2010. All rights reserved.
VERSION="2"
ONBOOT="yes"
PHYSPAGES="524288:524288"
SWAPPAGES="262144"
VM_OVERCOMMIT="1.5"
CPUUNITS="1000"
DISKSPACE="262144000:362144000"
DISKINODES="13495934:14845528"
QUOTATIME="0"
OFFLINE_MANAGEMENT="yes"
IP_ADDRESS="blanked for security"
ARCH="x86_64"
PLATFORM="linux"
VE_ROOT="/vz/root/$VEID"
VE_PRIVATE="/vz/private/$VEID"
OSTEMPLATE=".centos-6-x86_64"
DISTRIBUTION="redhat-el6"
TECHNOLOGIES="x86_64 nptl"
ORIGIN_SAMPLE="2048"
VEFORMAT="vz4"
VEID="113"
HOSTNAME="blanked for security"
NAMESERVER="4.2.2.1 4.2.2.2"
CONFIG_CUSTOMIZED="yes"
OFFLINE_SERVICE="vzpp"
DISABLED="no"
TEMPLATES=""
QUOTAUGIDLIMIT="950"
PRIVVMPAGES="2597004:2607396"
[root@hw006 vz-scripts]# cat /proc/user_beancounters
Version: 2.5
uid resource held maxheld barrier limit failcnt
206: kmemsize 43791991 44679168 9223372036854775807 9223372036854775807 0
lockedpages 0 0 524288 524288 0
privvmpages 387307 396542 1179648 1179648 0
shmpages 551 551 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numproc 102 111 9223372036854775807 9223372036854775807 0
physpages 260980 265069 524288 524288 0
vmguarpages 0 0 786432 786432 0
oomguarpages 244689 245048 524288 524288 0
numtcpsock 38 41 9223372036854775807 9223372036854775807 0
numflock 15 19 9223372036854775807 9223372036854775807 0
numpty 0 0 9223372036854775807 9223372036854775807 0
numsiginfo 0 6 9223372036854775807 9223372036854775807 0
tcpsndbuf 1170792 1308264 9223372036854775807 9223372036854775807 0
tcprcvbuf 622592 671744 9223372036854775807 9223372036854775807 0
othersockbuf 246608 546720 9223372036854775807 9223372036854775807 0
dgramrcvbuf 0 2312 9223372036854775807 9223372036854775807 0
numothersock 171 181 9223372036854775807 9223372036854775807 0
dcachesize 19927179 19935234 9223372036854775807 9223372036854775807 0
numfile 4546 4605 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numiptent 14 14 9223372036854775807 9223372036854775807 0
113: kmemsize 53467299 54534144 9223372036854775807 9223372036854775807 0
lockedpages 0 0 524288 524288 0
privvmpages 729538 730176 2597004 2607396 0
shmpages 695 695 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numproc 156 165 9223372036854775807 9223372036854775807 0
physpages 234175 235196 524288 524288 0
vmguarpages 0 0 786432 786432 0
oomguarpages 219726 219991 524288 524288 0
numtcpsock 63 72 9223372036854775807 9223372036854775807 0
numflock 16 21 9223372036854775807 9223372036854775807 0
numpty 0 0 9223372036854775807 9223372036854775807 0
numsiginfo 0 6 9223372036854775807 9223372036854775807 0
tcpsndbuf 1197216 1823800 9223372036854775807 9223372036854775807 0
tcprcvbuf 1032192 1179648 9223372036854775807 9223372036854775807 0
othersockbuf 357024 473808 9223372036854775807 9223372036854775807 0
dgramrcvbuf 0 2312 9223372036854775807 9223372036854775807 0
numothersock 240 248 9223372036854775807 9223372036854775807 0
dcachesize 16414765 16437408 9223372036854775807 9223372036854775807 0
numfile 4636 4915 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numiptent 14 14 9223372036854775807 9223372036854775807 0
1: kmemsize 420826 843776 11055923 11377049 0
lockedpages 0 0 256 256 0
privvmpages 61 513 65536 69632 0
shmpages 0 0 21504 21504 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numproc 3 9 240 240 0
physpages 104 968 65536 65536 0
vmguarpages 0 0 33792 2147483647 0
oomguarpages 13 16 26112 2147483647 0
numtcpsock 2 6 360 360 0
numflock 0 0 188 206 0
numpty 0 0 16 16 0
numsiginfo 0 0 256 256 0
tcpsndbuf 34880 104640 1720320 2703360 0
tcprcvbuf 32768 98304 1720320 2703360 0
othersockbuf 0 8456 1126080 2097152 0
dgramrcvbuf 0 0 262144 262144 0
numothersock 18 19 360 360 0
dcachesize 30063 61023 3409920 3624960 0
numfile 5 20 9312 9312 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numiptent 14 14 128 128 0
0: kmemsize 97224576 587595776 9223372036854775807 9223372036854775807 0
lockedpages 370860 370860 9223372036854775807 9223372036854775807 0
privvmpages 571780 2563497 9223372036854775807 9223372036854775807 0
shmpages 957 16702 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numproc 245 274 9223372036854775807 9223372036854775807 0
physpages 417657 805377 9223372036854775807 9223372036854775807 0
vmguarpages 0 0 0 0 0
oomguarpages 425617 519328 9223372036854775807 9223372036854775807 0
numtcpsock 28 45 9223372036854775807 9223372036854775807 0
numflock 8 18 9223372036854775807 9223372036854775807 0
numpty 1 3 9223372036854775807 9223372036854775807 0
numsiginfo 1 135 9223372036854775807 9223372036854775807 0
tcpsndbuf 505760 3638192 9223372036854775807 9223372036854775807 0
tcprcvbuf 458752 737280 9223372036854775807 9223372036854775807 0
othersockbuf 397664 762208 9223372036854775807 9223372036854775807 0
dgramrcvbuf 0 16744 9223372036854775807 9223372036854775807 0
numothersock 263 310 9223372036854775807 9223372036854775807 0
dcachesize 50072906 546476318 9223372036854775807 9223372036854775807 0
numfile 1433 1995 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numiptent 20 20 9223372036854775807 9223372036854775807 0
Re: Core Dump Error Message
Posted: Tue Oct 21, 2014 2:49 pm
by faris
Press "0" when looking at vzstat to see the HN's own stats (listed as Container 0). Also try pressing "a" which gives you an average rather than an instantaneous figure.
Looks healthy to me though.
Re: Core Dump Error Message
Posted: Sun Feb 08, 2015 10:05 am
by KrazyBob
A backup was impossible so I crossed my fingers and did a successful reboot. But dammit. Plesk won't let me put in a cron and forces me to login. But yet my key works if I ssh in. I'm trying to do a kernel update but vzup2date doesn't seem to have it. I need to spend more time in here but I am injured and bedridden. Typing is difficult.