Spamassassin timeouts - Just an observation
campbell at cnpapers.com
Thu Jan 15 20:00:30 GMT 2009
Kai Schaetzl wrote:
> Steve Campbell wrote on Thu, 15 Jan 2009 12:58:01 -0500:
>> I don't think I'm getting dns timeouts,
> I went to the archives to read some of your earlier replies.
>> I have reduced the number of children on all machines from 5 to 3. This
>> has reduced the total of timeouts - which sort of points to machine
>> capacity. I only use 10 messages per batch. The main machines have 1 GB
>> of RAM.
> You were running 5 MS children with 1 GB of RAM? Each of these children
> might need around 100 MB, so half of that goes to MS+SA alone. You are
> using a *lot* of extra rules. That all adds to RAM. Check what "ps
> waux|grep Mail" says about memory. Do you run clamd or clamav or even the
> clamav module. This also adds to RAM.
I still go back to the fact that two versions ago, this wasn't a
problem. And there were considerably more emails to give to MS/SA. I'm
not arguing, mind you. I think you're probably right about not enough
RAM. I've been begging for it for months, and only after the dramatic
slowdowns and complaints did I get to order it.
> Are you checking load average regularly? What does free tell about memory
> usage and swap?
I monitor load average with MailWatch up most of the day when problems
occur. We seem to have slow mornings with LA way below 1. The
afternoon's LA start climbing along with the input queue backlog. Right
now, we're about 50 minutes behind with about 500 messages waiting.
top tells me I'm using almost all memory with 200 MB swap being used.
That was why I started begging for RAM.
> Currently, you have disabled all RBL tests, so timeouts, if there are any,
> won't show for these, of course.
> Have you already timed (a few times) a spamassassin -D -lint run during
> normal production hours?
I wasn't sure I got all the scores zeroed. Just to make sure, I turned
on skip_rbl_checks. This caused the LA to steady out at about 4. It
would fluctuate as high as 8.
Running spamassassin -D -lint is pretty much useless once the backup
starts. It takes it's time and I don't know whether it's due to load or
problems with SA. It seems to run and output a fairly steady output at
low peak time.
> If this is a load issue but occurs not too often (300 timouts out of 10k
> processed messages isn't that bad) you might just use a longer timeout
> setting for SA or and/or reduce the size of the message that you hand over
> to SA.
Due to the access file, the server only processes about 7k a day. Tons
are rejected either due to GreetPause, mta rbl rejection, access file
REJECTs. This alone adds some load, even though not as much as it would
if processed by MS/SA.
You might have hit on something there with the size to hand over to SA.
I recently had to up this for some large files being emailed in. There's
a lawyer who was photocopying briefs, scanning them, and making a PDF to
send to someone here. The size was around 50MB. If the limit set up in
MS/SA is smaller than the size of the attachment being sent, it doesn't
deliver it and doesn't quarantine it. I wish there was an option to at
least quarantine it, but I haven't found it. We have subsequently
convinced the lawyer to at least break them up. I'll lower this, but
most of the emails coming in are on average under 10K.
>> Next to try the skip_rbl_checks
> you are already skipping!
As I stated earlier, this was just to ensure I got them all. The only
difference this made was the LA spike is now steadily around 4, so I
guess I missed a few with the score thing.
Thanks so very much for the very informative reply. Although I've been
using all of this for years, and the skip_rbl_checks used to be a common
option to change, I never thought much about it. Age does that to a person.
More information about the MailScanner