Spamassassin timeouts - Just an observation
Ugo Bellavance
ugob at lubik.ca
Sat Jan 3 02:34:05 GMT 2009
Steve Campbell wrote:
>
>
> Martin Hepworth wrote:
>> 2009/1/2 Steve Campbell <campbell at cnpapers.com>:
>>
>>> Just got back from the holidays, so my reply is a little overdue.
>>>
>>> Ugo Bellavance wrote:
>>>
>>>> Steve Campbell wrote:
>>>>
>>>>> The topic seems to come up quite often, and although the answers are
>>>>> usually pretty much the same, I never really see much of a "Solved"
>>>>> reply.
>>>>>
>>>>> I upgraded from version 4.58, where I saw maybe 3 or 4 timeouts, to
>>>>> 4.71,
>>>>> and saw an immediate increase to around 100-300 timeouts. I ran all
>>>>> of the
>>>>> --debug and --debug-sa flavors of help I could think of. I reviewed
>>>>> the
>>>>> logs. I run a caching nameserver. And I zeroed out some RBL scores.
>>>>> I still
>>>>> have yet to find why this happens. I eventually upgraded to 4.72, and
>>>>> started using clamd. I still get the large numbers of timeouts. I
>>>>> would
>>>>> think that the fact that this doesn't happen with all of my large
>>>>> batches
>>>>> indicates I'm not using any dead RBLs.
>>>>>
>>>>> I'm still exploring the causes, but haven't had much luck. I find
>>>>> it odd
>>>>> that SA would really keep RBLs that have expired over time in their
>>>>> default
>>>>> files, so I really don't think it's that. I do all of my checking
>>>>> of RBLs in
>>>>> SA. I always do my configuration and language upgrades, and search for
>>>>> rpmnew and rpmsave files. This has happened on 3 different but very
>>>>> similar
>>>>> servers that I run.
>>>>>
>>>>> I'm not really asking for assistance here, but just wanted to let
>>>>> others
>>>>> who are seeing this problem to be aware that there is something
>>>>> unique
>>>>> triggering this. I'm fairly confident that it is not happening at
>>>>> all sites,
>>>>> but something here is causing it. It may not even be related to
>>>>> MS/SA, but
>>>>> totally something else.
>>>>>
>>>>> The most I could ask for is a small checklist of what to ensure I have
>>>>> set. Every time I try to use the debug procedures, the tests perform
>>>>> flawlessly with no errors. It is very sporadic. We receive those
>>>>> normal
>>>>> bursts of spam, but for the most part, the batches ares small. The
>>>>> average
>>>>> amount of email per day is usually around 10k emails, but I get the
>>>>> above
>>>>> stated 100-300 timeouts. I'm going to try and match batch numbers to
>>>>> timeouts and see if this will reveal anything. I only run 3
>>>>> Children on a
>>>>> fairly hefty Dell PowerEdge, but I do use 30 messages per child. I
>>>>> don't
>>>>> think this is excessive thought.
>>>>>
>>>>> Hope everyone has a Happy Holiday.
>>>>>
>>>> What is the machine?
>>>>
>>>>
>>> The machines are all Dell PowerEdge servers. There are three servers
>>> involved. Two are well equipped. One is just used as an interface for
>>> our
>>> webmail users. Not a lot going through it.
>>>
>>>> Did you check the optimization section of the MAQ page on the wiki?
>>>>
>>> No, I haven't, but I will. I have reviewed it before, but will look
>>> to see
>>> if anything has changed or been added.
>>>
>>>> When running --debug --debug-sa, don't you find anything that is a bit
>>>> slow?
>>>>
>>> Nothing at all.
>>>
>>> I would think that if something were causing these that were DNS or RBL
>>> related, it would show for most all of the batches, not just random
>>> batches.
>>> So I am guessing it is either network clutter or something else. I just
>>> don't know yet. But still, there is the situation where this all
>>> started to
>>> happen after an upgrade. I'm going to review in the upgraded conf
>>> files and
>>> see if I've missed something.
>>>
>>> I have reduced the number of children on all machines from 5 to 3.
>>> This has
>>> reduced the total of timeouts - which sort of points to machine
>>> capacity. I
>>> only use 10 messages per batch. The main machines have 1 GB of RAM. The
>>> actual number of emails going through MS is quite low; around 10K, but I
>>> have quite a large access file, and the number of emails getting to the
>>> machines are closer to 25k+.
>>>
>>>
>>> Thanks for the thoughts and ideas. I'll keep digging and maybe find
>>> something.
>>>
>>> steve
>>>
>>> --
>>> MailScanner mailing list
>>> mailscanner at lists.mailscanner.info
>>> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>>>
>>> Before posting, read http://wiki.mailscanner.info/posting
>>>
>>> Support MailScanner development - buy the book off the website!
>>>
>>>
>>
>>
>> Steve
>>
>> 1GB ram is pretty minimal for SA...depends what third party rules you
>> got, but I'd consider increasing ram.
>>
>> I presume you've got a local caching nameserver and you've dropped
>> most of the RBL's by giving them a zero score. Also trying using
>> opendns as your forward query servers which can operate lot quicker
>> than alot of ISP's DNS.
>>
>>
>
> Martin,
>
> I see in 'top' that I am very thin on RAM at times, but it still doesn't
> definitively explain the randomness of the timeouts. We run our own DNS
> servers, and I use a caching nameserver on each server. We also use
> OpenDNS for certain purposes, but not mailserver instances.
>
> I guess the problem is more about the randomness. I don't think the
> upgrade of MS would have caused such a large difference. I was running
> SA 3 before and after the upgrade, so there shouldn't have been a large
> increase there. Now there could have been a big difference in the way
> SA was acting, but I'm not aware (ignorant is probably a better
> adjective for my knowledge) of any great changes.
Well, the randomness can be simply caused by swapping. For some reason,
a system loads a little more in RAM that what your RAM can take, and
it starts swapping. As Martin said, 1 G is minimal for a
MailScanner/SA/AV system. Increasing your batch sizes to 30 may also
help. But the first think I'd do is add another GB of ram.
More information about the MailScanner
mailing list