99-raid-check

General Discussion of atomic repo and development projects.

Ask for help here with anything else not covered by other forums.
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

99-raid-check

Unread post by premierhosting »

Hi Guys,

99-raid-check runs in the cron.weekly at 4:22 am on Sunday. It's for syncing a RAID 1 software array.

Perhaps this has something to do with my server slowing to a crawl and hanging weekly, usually late Sunday night / early Monday morning.

Apparently this is new in RHEL 5.5 or 5.4 (and hence Centos 5.5). Any suggestions on this, either debugging the issue, or alleviating it? Could one safely discard this RAID check? At least for one week to test the theory? I'm running this on:

Code: Select all

uname -a
Linux hostname 2.6.32.28-1.art.x86_64 #1 SMP Mon Feb 14 11:06:49 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
The concern I have is that its a glitch between the kernel and the RAID stuff. The server provider (1and1) basically says nothing is wrong with it, must be configuration. Once per week crashes are not cool. :)

Any ideas at all would be much appreciated.
scott
Atomicorp Staff - Site Admin
Atomicorp Staff - Site Admin
Posts: 8355
Joined: Wed Dec 31, 1969 8:00 pm
Location: earth
Contact:

Re: 99-raid-check

Unread post by scott »

That does seem suspicious, how are your raids put together? We have a bunch of servers at 1and1 as well (rebuilt with AOOI) and Ive never had anything like that happen. That script is going to force a sync event to the raid if the array is idle, which isn't terribly risky. It makes me wonder if the array is actually damaged or something.
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

Re: 99-raid-check

Unread post by premierhosting »

how are your raids put together?
I'm not sure how to answer this question. What are you looking for? How would I tell if the array is damaged? It's reporting clean on the mdadm check. When I brought this online, I didn't use the AOOI, it seemed to work by just doing the normal route, and the AOOI thing looked more risky.
BruceLee
Forum Regular
Forum Regular
Posts: 879
Joined: Sat Mar 28, 2009 6:58 pm
Location: Germany

Re: 99-raid-check

Unread post by BruceLee »

at first I wuould check if its this raid-check script by disabling it for testing, or let it run another day.
Otherwise you start digging for wrong configured raid, bad harddrive even if there is none.
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

Re: 99-raid-check

Unread post by premierhosting »

I manually ran the script, and watched it finish, then 12 hours after that, give or take, the server hung again. Since it's been starting at 4:33 am on Sundays and probably finishing 4-5 hours later, that would put the Sunday night late hang about the same amount of time away. Is there any logic to that being causal? I disabled the script from cron.weekly so we'll see how the results go this weekend.
BruceLee
Forum Regular
Forum Regular
Posts: 879
Joined: Sat Mar 28, 2009 6:58 pm
Location: Germany

Re: 99-raid-check

Unread post by BruceLee »

Could still be some other scripts. Maybe a combination of two. Check all cronjobs and logs.
Do you monitor cpu,ram,swap usage,etc. with nagios, mrtg or something similar? It's always helpful to have some graphs for comparison.
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

Re: 99-raid-check

Unread post by premierhosting »

I haven't succeeded at getting nagios or mrtg going. Will need to try again. I've been doing an update >> logfile every 5 minutes on a cron to see if there is a load spike before hang, and there is nothing way off. Sometimes it shows a load a little over 1, but that's not a big deal.

I've poured over logfiles, and have nothing useful to show for it. Cronjobs brought up this possibility. The other weekly crons are:
-rwxr-xr-x 1 root root 380 Mar 27 2007 0anacron
-rwxr-xr-x 1 root root 146 Oct 29 05:02 50plesk-weekly
-rwxr-xr-x 1 root root 251 Sep 20 10:05 asl-webapp-inventory
-rwxr-xr-x 1 root root 414 Jan 6 2007 makewhatis.cron
The only other thing I do by cron is a mysql_backup shell script on each mysql table. This allows a quick recovery if data goes badly for the customers. These are staggered ever 15 minutes starting about midnight, once per day.
BruceLee
Forum Regular
Forum Regular
Posts: 879
Joined: Sat Mar 28, 2009 6:58 pm
Location: Germany

Re: 99-raid-check

Unread post by BruceLee »

It does not have to be a weekly script. Still a combination or something else is possible.
Maybe asl-webapp-inventory but I don't think so. It's a little bit resource consuming too. But it only runs weekly if you set it through ASL to weekly.
Check in ASL Gui Configuration the last point.
You will have to dig deeper to exclude scripts step by step and track it down to something. Everything else is just guessing.
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

Re: 99-raid-check

Unread post by premierhosting »

Up for a happy 15 days after disabling the 99-raid-check.

Ideas?
Troy McClure
Forum Regular
Forum Regular
Posts: 196
Joined: Tue May 10, 2005 1:24 pm

Re: 99-raid-check

Unread post by Troy McClure »

I seem to be having a problem with 99-raid-check too. Maybe it is an issue with newer 1and1 server because I just got mine setup and have this problem every time I try to run it. It completely hangs my server and I have to call them and get them to reboot it. The load on the server goes through the roof before it completely locks and I can't do anything. I did use AOOI to install mine and have used both the asl kernel as well as the standard CentOS kernel with the same result.
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

Re: 99-raid-check

Unread post by premierhosting »

Troy - I've just been fine by disabling it. Feel like I'm missing out on something though.
Troy McClure
Forum Regular
Forum Regular
Posts: 196
Joined: Tue May 10, 2005 1:24 pm

Re: 99-raid-check

Unread post by Troy McClure »

Yeah, I just disabled it too. I would like to find out why this happens on the newer 1and1 server that I have and not the older one that I have though. On my older box it does cause the load to jump, but nothing too bad. Nothing like what I see on the newer box. On my newer box it will run for about 10 minutes and then it doesn't respond anymore and I have to reboot it. I have been able to see the load right before it stops responding completely and the cpu load is at 30.
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

Re: 99-raid-check

Unread post by premierhosting »

Anyone have the raid-check cron running that *is not* crashing their server?
BruceLee
Forum Regular
Forum Regular
Posts: 879
Joined: Sat Mar 28, 2009 6:58 pm
Location: Germany

Re: 99-raid-check

Unread post by BruceLee »

yes, I do (also on a 1and1 server)
premierhosting
Forum Regular
Forum Regular
Posts: 257
Joined: Wed Aug 04, 2010 2:52 pm

Re: 99-raid-check

Unread post by premierhosting »

Hi BruceLee,

Can you print your output:

[root@server1]# uname -a
Linux xxxx.xxxxx.xxx 2.6.32.28-1.art.x86_64 #1 SMP Mon Feb 14 11:06:49 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
Post Reply