Bayesian shenanigans (i.e. problems)
mailscanner at ecs.soton.ac.uk
Thu Jan 22 20:14:37 GMT 2004
What's the exact command people type to do the database expiry?
Just --force-expire or ---rebuild as well?
I need to know what to make the code do.
The code is nearly there (but untested).
At 17:21 22/01/2004, you wrote:
>At 16:52 22/01/2004, you wrote:
>>On Thu, 22 Jan 2004, Steve Freegard wrote:
>> > I haven't been following this thread closely, so apologies if this has
>> > already been covered.
>>It hasn't, so you reply is appreciated!
>> > Maybe the error is being caused by opportunistic bayes expiry which could
>> > take long enough on your system to cause MailScanner to time-out and kill
>> > off SA mid-expiry causing your orphaned files??
>>That sounds very plausible. I have gone even deeper into the "maillog"
>>files, and these "Delete bayes ..." for a particular MS process occur
>>40 seconds after it starts the spam analysis. And the MS conf has SA
>>timeout of 40 seconds. It all fits.
>>So very promising indeed.
>> > You could try setting 'bayes_auto_expire 0' in
>> spam.assassin.prefs.conf and
>> > then creating nightly cron job to run a script and does an 'sa-learn -p
>> > /etc/MailScanner/spam.assassin.prefs.conf --rebuild --force-expire'.
>>Yes, that might be worth a try, at least as proof of concept.
>>But I wonder whether we need a cleaner solution (remember, a few other
>>folk have seen one or other variant of this) that, as default behaviour,
>>tries to prevent this. Two possibilities:
>>1. MS installation-time (and defaults): MS defaults 'bayes_auto_expire 0'
>> and accompanies that with setting the cron job? But setting the cron
>> job is highly OS-specific (i.e. variable!), and overall this doesn't
>> feel quite right.
>>2. MS run-time: MS defaults 'bayes_auto_expire 0', but at start up (which
>> it generally does every four hours) it does "--rebuild --force-expire",
>> preferably (if possible) by the appropriate subroutine call to SA.
>>This second feels better and cleaner (although there's a residual issue of
>>the near simultaneous start-up of around five MS processes).
>>Julian: Do you have any thoughts? I'd be happy to try to cobble toegether
>>a proof of concept patch for that second version (although I'd prefer it
>>if it arrived fully-fledged on the doorstep!).
>The trouble with option 2 is that the child processes start up completely
>independently of each other, and doing it once at the startup of every
>child process would cause a huge holdup while all n children (n could
>easily be 12 on a dual-CPU box) ran their own bayes-expire. However, there
>are ways around this, as there always are, so I may be able to come up with
>a better solution that would do a bayes expire approximately once every 24
>hours or so, which should be plenty. The whole system would have to sit and
>hang while this took place, unless I temporarily disabled SpamAssassin (or
>*possibly* even just bayes) while it was doing it.
>This is going to be a bit of a pig to write :-(
>MailScanner thanks transtec Computers for their support
>PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
More information about the MailScanner