Bayesian shenanigans (i.e. problems)

David Lee t.d.lee at DURHAM.AC.UK
Thu Jan 22 16:52:55 GMT 2004


On Thu, 22 Jan 2004, Steve Freegard wrote:

> I haven't been following this thread closely, so apologies if this has
> already been covered.

It hasn't, so you reply is appreciated!

> Maybe the error is being caused by opportunistic bayes expiry which could
> take long enough on your system to cause MailScanner to time-out and kill
> off SA mid-expiry causing your orphaned files??

That sounds very plausible.  I have gone even deeper into the "maillog"
files, and these "Delete bayes ..." for a particular MS process occur
40 seconds after it starts the spam analysis.  And the MS conf has SA
timeout of 40 seconds.  It all fits.

So very promising indeed.

> You could try setting 'bayes_auto_expire 0' in spam.assassin.prefs.conf and
> then creating nightly cron job to run a script and does an 'sa-learn -p
> /etc/MailScanner/spam.assassin.prefs.conf --rebuild --force-expire'.

Yes, that might be worth a try, at least as proof of concept.

But I wonder whether we need a cleaner solution (remember, a few other
folk have seen one or other variant of this) that, as default behaviour,
tries to prevent this.  Two possibilities:

1. MS installation-time (and defaults):  MS defaults 'bayes_auto_expire 0'
   and accompanies that with setting the cron job?  But setting the cron
   job is highly OS-specific (i.e. variable!), and overall this doesn't
   feel quite right.

2. MS run-time: MS defaults 'bayes_auto_expire 0', but at start up (which
   it generally does every four hours) it does "--rebuild --force-expire",
   preferably (if possible) by the appropriate subroutine call to SA.

This second feels better and cleaner (although there's a residual issue of
the near simultaneous start-up of around five MS processes).

Julian: Do you have any thoughts?  I'd be happy to try to cobble toegether
a proof of concept patch for that second version (although I'd prefer it
if it arrived fully-fledged on the doorstep!).

--

:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham                :
:  Phone: +44 191 334 2752                  U.K.                  :



More information about the MailScanner mailing list