Bayesian shenanigans (i.e. problems)

Jeff A. Earickson jaearick at COLBY.EDU
Thu Jan 22 21:19:45 GMT 2004


Julian,

My nightly cron script does:

LOGFILE=/var/tmp/learn.spam.log
PREFS=/opt/MailScanner/etc/spam.assassin.prefs.conf
SALEARN=/opt/perl5/bin/sa-learn

$SALEARN --prefs-file=$PREFS --rebuild  --force-expire

before doing the ham/spam learning.  I too have noticed beaucoup of
bayes_toks.expire$$ files in /var/spool/spamassassin, with "bayes locked"
blurbs in syslog.  My setup: Sol 9, MS 4.25-14, SA 2.63, perl 5.8.2,
sophos and clamav-0.65, razor2.

Jeff Earickson
Colby College

On Thu, 22 Jan 2004, Julian Field wrote:

> Date: Thu, 22 Jan 2004 20:14:37 +0000
> From: Julian Field <mailscanner at ECS.SOTON.AC.UK>
> Reply-To: MailScanner mailing list <MAILSCANNER at JISCMAIL.AC.UK>
> To: MAILSCANNER at JISCMAIL.AC.UK
> Subject: Re: Bayesian shenanigans (i.e. problems)
>
> What's the exact command people type to do the database expiry?
>
> Just --force-expire or ---rebuild as well?
>
> I need to know what to make the code do.
>
> The code is nearly there (but untested).
>
> At 17:21 22/01/2004, you wrote:
> >At 16:52 22/01/2004, you wrote:
> >>On Thu, 22 Jan 2004, Steve Freegard wrote:
> >>
> >> > I haven't been following this thread closely, so apologies if this has
> >> > already been covered.
> >>
> >>It hasn't, so you reply is appreciated!
> >>
> >> > Maybe the error is being caused by opportunistic bayes expiry which could
> >> > take long enough on your system to cause MailScanner to time-out and kill
> >> > off SA mid-expiry causing your orphaned files??
> >>
> >>That sounds very plausible.  I have gone even deeper into the "maillog"
> >>files, and these "Delete bayes ..." for a particular MS process occur
> >>40 seconds after it starts the spam analysis.  And the MS conf has SA
> >>timeout of 40 seconds.  It all fits.
> >>
> >>So very promising indeed.
> >>
> >> > You could try setting 'bayes_auto_expire 0' in
> >> spam.assassin.prefs.conf and
> >> > then creating nightly cron job to run a script and does an 'sa-learn -p
> >> > /etc/MailScanner/spam.assassin.prefs.conf --rebuild --force-expire'.
> >>
> >>Yes, that might be worth a try, at least as proof of concept.
> >>
> >>But I wonder whether we need a cleaner solution (remember, a few other
> >>folk have seen one or other variant of this) that, as default behaviour,
> >>tries to prevent this.  Two possibilities:
> >>
> >>1. MS installation-time (and defaults):  MS defaults 'bayes_auto_expire 0'
> >>    and accompanies that with setting the cron job?  But setting the cron
> >>    job is highly OS-specific (i.e. variable!), and overall this doesn't
> >>    feel quite right.
> >>
> >>2. MS run-time: MS defaults 'bayes_auto_expire 0', but at start up (which
> >>    it generally does every four hours) it does "--rebuild --force-expire",
> >>    preferably (if possible) by the appropriate subroutine call to SA.
> >>
> >>This second feels better and cleaner (although there's a residual issue of
> >>the near simultaneous start-up of around five MS processes).
> >>
> >>Julian: Do you have any thoughts?  I'd be happy to try to cobble toegether
> >>a proof of concept patch for that second version (although I'd prefer it
> >>if it arrived fully-fledged on the doorstep!).
> >
> >The trouble with option 2 is that the child processes start up completely
> >independently of each other, and doing it once at the startup of every
> >child process would cause a huge holdup while all n children (n could
> >easily be 12 on a dual-CPU box) ran their own bayes-expire. However, there
> >are ways around this, as there always are, so I may be able to come up with
> >a better solution that would do a bayes expire approximately once every 24
> >hours or so, which should be plenty. The whole system would have to sit and
> >hang while this took place, unless I temporarily disabled SpamAssassin (or
> >*possibly* even just bayes) while it was doing it.
> >
> >This is going to be a bit of a pig to write :-(
> >--
> >Julian Field
> >www.MailScanner.info
> >MailScanner thanks transtec Computers for their support
> >
> >PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
>
> --
> Julian Field
> www.MailScanner.info
> Professional Support Services at www.MailScanner.biz
> MailScanner thanks transtec Computers for their support
> PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
>



More information about the MailScanner mailing list