Spamassassin timeouts - Just an observation

Steve Campbell campbell at cnpapers.com
Fri Jan 2 18:27:04 GMT 2009



Martin Hepworth wrote:
> 2009/1/2 Steve Campbell <campbell at cnpapers.com>:
>   
>> Just got back from the holidays, so my reply is a little overdue.
>>
>> Ugo Bellavance wrote:
>>     
>>> Steve Campbell wrote:
>>>       
>>>> The topic seems to come up quite often, and although the answers are
>>>> usually pretty much the same, I never really see much of a "Solved" reply.
>>>>
>>>> I upgraded from version 4.58, where I saw maybe 3 or 4 timeouts, to 4.71,
>>>> and saw an immediate increase to around 100-300 timeouts. I ran all of the
>>>> --debug and --debug-sa flavors of help I could think of. I reviewed the
>>>> logs. I run a caching nameserver. And I zeroed out some RBL scores. I still
>>>> have yet to find why this happens. I eventually upgraded to 4.72, and
>>>> started using clamd. I still get the large numbers of timeouts. I would
>>>> think that the fact that this doesn't happen with all of my large batches
>>>> indicates I'm not using any dead RBLs.
>>>>
>>>> I'm still exploring the causes, but haven't had much luck. I find it odd
>>>> that SA would really keep RBLs that have expired over time in their default
>>>> files, so I really don't think it's that. I do all of my checking of RBLs in
>>>> SA. I always do my configuration and language upgrades, and search for
>>>> rpmnew and rpmsave files. This has happened on 3 different but very similar
>>>> servers that I run.
>>>>
>>>> I'm not really asking for assistance here, but just wanted to let others
>>>> who are seeing this problem to  be aware that there is something unique
>>>> triggering this. I'm fairly confident that it is not happening at all sites,
>>>> but something here is causing it. It may not even be related to MS/SA, but
>>>> totally something else.
>>>>
>>>> The most I could ask for is a small checklist of what to ensure I have
>>>> set. Every time I try to use the debug procedures, the tests perform
>>>> flawlessly with no errors. It is very sporadic. We receive those normal
>>>> bursts of spam, but for the most part, the batches ares small. The average
>>>> amount of email per day is usually around 10k emails, but I get the above
>>>> stated 100-300 timeouts. I'm going to try and match batch numbers to
>>>> timeouts and see if this will reveal anything. I only run 3 Children on a
>>>> fairly hefty Dell PowerEdge, but I do use 30 messages per child. I don't
>>>> think this is excessive thought.
>>>>
>>>> Hope everyone has a Happy Holiday.
>>>>         
>>> What is the machine?
>>>
>>>       
>> The machines are all Dell PowerEdge servers. There are three servers
>> involved. Two are well equipped. One is just used as an interface for our
>> webmail users. Not a lot going through it.
>>     
>>> Did you check the optimization section of the MAQ page on the wiki?
>>>       
>> No, I haven't, but I will. I have reviewed it before, but will look to see
>> if anything has changed or been added.
>>     
>>> When running --debug --debug-sa, don't you find anything that is a bit
>>> slow?
>>>       
>> Nothing at all.
>>
>> I would think that if something were causing these that were DNS or RBL
>> related, it would show for most all of the batches, not just random batches.
>> So I am guessing it is either network clutter or something else. I just
>> don't know yet. But still, there is the situation where this all started to
>> happen after an upgrade. I'm going to review in the upgraded conf files and
>> see if I've missed something.
>>
>> I have reduced the number of children on all machines from 5 to 3. This has
>> reduced the total of timeouts - which sort of points to machine capacity. I
>> only use 10 messages per batch. The main machines have 1 GB of RAM. The
>> actual number of emails going through MS is quite low; around 10K, but I
>> have quite a large access file, and the number of emails getting to the
>> machines are closer to 25k+.
>>
>>
>> Thanks for the thoughts and ideas. I'll keep digging and maybe find
>> something.
>>
>> steve
>>
>> --
>> MailScanner mailing list
>> mailscanner at lists.mailscanner.info
>> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>>
>> Before posting, read http://wiki.mailscanner.info/posting
>>
>> Support MailScanner development - buy the book off the website!
>>
>>     
>
>
> Steve
>
> 1GB ram is pretty minimal for SA...depends what third party rules you
> got, but I'd consider increasing ram.
>
> I presume you've got a local caching nameserver and you've dropped
> most of the RBL's by giving them a zero score. Also trying using
> opendns as your forward query servers which can operate lot quicker
> than alot of ISP's DNS.
>
>   

Martin,

I see in 'top' that I am very thin on RAM at times, but it still doesn't 
definitively explain the randomness of the timeouts. We run our own DNS 
servers, and I use a caching nameserver on each server. We also use 
OpenDNS for certain purposes, but not mailserver instances.

I guess the problem is more about the randomness. I don't think the 
upgrade of MS would have caused such a large difference. I was running 
SA 3 before and after the upgrade, so there shouldn't have been  a large 
increase there.  Now there could have been a big difference in the way 
SA was acting, but I'm not aware (ignorant is probably a better 
adjective for my knowledge) of any great changes.

I am aware of the .cf file I can view to discover the RBLs that SA uses, 
so I could start zeroing out a lot of those. Does anyone, though, have a 
recommendation for what should be used (non-zero entries) as a general rule?

Thanks



More information about the MailScanner mailing list