Spamassassin timeouts - Just an observation

Steve Campbell campbell at cnpapers.com
Fri Jan 2 13:59:27 GMT 2009


Just got back from the holidays, so my reply is a little overdue.

Ugo Bellavance wrote:
> Steve Campbell wrote:
>> The topic seems to come up quite often, and although the answers are 
>> usually pretty much the same, I never really see much of a "Solved" 
>> reply.
>>
>> I upgraded from version 4.58, where I saw maybe 3 or 4 timeouts, to 
>> 4.71, and saw an immediate increase to around 100-300 timeouts. I ran 
>> all of the --debug and --debug-sa flavors of help I could think of. I 
>> reviewed the logs. I run a caching nameserver. And I zeroed out some 
>> RBL scores. I still have yet to find why this happens. I eventually 
>> upgraded to 4.72, and started using clamd. I still get the large 
>> numbers of timeouts. I would think that the fact that this doesn't 
>> happen with all of my large batches indicates I'm not using any dead 
>> RBLs.
>>
>> I'm still exploring the causes, but haven't had much luck. I find it 
>> odd that SA would really keep RBLs that have expired over time in 
>> their default files, so I really don't think it's that. I do all of 
>> my checking of RBLs in SA. I always do my configuration and language 
>> upgrades, and search for rpmnew and rpmsave files. This has happened 
>> on 3 different but very similar servers that I run.
>>
>> I'm not really asking for assistance here, but just wanted to let 
>> others who are seeing this problem to  be aware that there is 
>> something unique triggering this. I'm fairly confident that it is not 
>> happening at all sites, but something here is causing it. It may not 
>> even be related to MS/SA, but totally something else.
>>
>> The most I could ask for is a small checklist of what to ensure I 
>> have set. Every time I try to use the debug procedures, the tests 
>> perform flawlessly with no errors. It is very sporadic. We receive 
>> those normal bursts of spam, but for the most part, the batches ares 
>> small. The average amount of email per day is usually around 10k 
>> emails, but I get the above stated 100-300 timeouts. I'm going to try 
>> and match batch numbers to timeouts and see if this will reveal 
>> anything. I only run 3 Children on a fairly hefty Dell PowerEdge, but 
>> I do use 30 messages per child. I don't think this is excessive thought.
>>
>> Hope everyone has a Happy Holiday.
>
> What is the machine?
>
The machines are all Dell PowerEdge servers. There are three servers 
involved. Two are well equipped. One is just used as an interface for 
our webmail users. Not a lot going through it.
> Did you check the optimization section of the MAQ page on the wiki?

No, I haven't, but I will. I have reviewed it before, but will look to 
see if anything has changed or been added.
>
> When running --debug --debug-sa, don't you find anything that is a bit 
> slow?

Nothing at all.

I would think that if something were causing these that were DNS or RBL 
related, it would show for most all of the batches, not just random 
batches. So I am guessing it is either network clutter or something 
else. I just don't know yet. But still, there is the situation where 
this all started to happen after an upgrade. I'm going to review in the 
upgraded conf files and see if I've missed something.

I have reduced the number of children on all machines from 5 to 3. This 
has reduced the total of timeouts - which sort of points to machine 
capacity. I only use 10 messages per batch. The main machines have 1 GB 
of RAM. The actual number of emails going through MS is quite low; 
around 10K, but I have quite a large access file, and the number of 
emails getting to the machines are closer to 25k+.


Thanks for the thoughts and ideas. I'll keep digging and maybe find 
something.

steve



More information about the MailScanner mailing list