we use DAC960 hardware adn have seen similar things.  Usually form
pushing scsi limits(e.g. to long of cable, improper cable, and etc).
The drives are fine and are logged in /var/log/messages along with
dmesg.  You can control the drives with /proc/rd/c0/user_command (where
c0 stands for controller 0).  You can see what is going on with

I have a script that dumps the status to a port and a little visual c
program that our helpdesk uses to monitor the status of the raid(since
converted to VB).  Even wrote a mon script at one point to parse the
output and notify me of a failed drive, and planning on writing a nagios
module for it(however this is low priority since I quite building things
that pushed the scsi limits drives don't fail).  Once notified, you can
echo "make-online channel:ID" > /proc/rd/c0/user_command  replacing
channel and ID with the correct channel and ID of the drive that is

If you boot off the raid and loose 2 drives (or as I often see a
channel) you will have a kernel panic.  If your mounting /var/spool/mail
on the raid then you will find your machine almost hangs just b/c of the
amount of processing going on trying to find where to put mail on a busy

Hope this help, and if you have any questions please feel free to
contact me directly.


On Sun, 2003-01-05 at 18:12, Nick Phillips wrote:
> On Monday, January 6, 2003, at 01:34  pm, Jim Levie wrote:
> > MailScanner bangs on the disk quite a bit as compared to just
> > sendmail/procmail. My suspicion is that the fault is associated with
> > the
> > disk subsystem activity.
> Are you getting log messages from the DAC960 driver at all? You might
> want to check
> that by, say, fiddling with the control files in /proc (sorry, can't
> remember which ones) to manually take a drive offline and see whether
> it gets logged.
> It's just that I've seen problems with a DAC960 before where there were
> communication errors between the controller and the drives (introduced
> by the drive bay's backplane, IIRC), which caused the drives to be
> marked as bad by the controller, one after the other.
> Once they were all down, kernel panic followed, IIRC.
> What type of server is it (brand, model etc.)?
> Cheers,
> Nick

