Bayesian shenanigans (i.e. problems)

David Lee t.d.lee at DURHAM.AC.UK
Tue Jan 20 10:00:18 GMT 2004

On Fri, 16 Jan 2004, Peter Bates wrote:

> > mkettler at EVI-INC.COM 15/01/04 15:56:50 >>>
> At 05:38 AM 1/15/2004, David Lee wrote:
> >"Me, too!" (bayes_toks ~ 50MB, ~ 1.4GB).  Glad I'm not
> >alone.
> >Yes those sizes are unreasonable... It sounds like expiry is never
> running
> >on your system.
> >Try running expiry manualy using sa-learn --force-expire and see if
> it
> >clears things up.
> Well, I've done a --force-expire, and got:
> -rw-r--r--    1 postfix  postfix       40M Jan 16 14:51 bayes_seen
> -rw-------    1 postfix  postfix      123k Jan 16 14:51 bayes_journal
> -rw-------    1 postfix  postfix      265M Jan 16 14:51 bayes_toks
> -rw-------    1 postfix  postfix      2.7G Jan 16 13:08
> -rw-r--r--    1 postfix  postfix      4.8M Oct 15 09:22 old_bayes_seen
> -rw-r--r--    1 postfix  postfix       22M Oct 15 09:22 old_bayes_toks
> now... and my SA/MS is timing out once again, now I've re-enabled Bayes
> with use_bayes...
> I'm almost tempted to have a normal SA run without Bayes, and then use
> MCP to reprocess the message again with Bayes (or vice versa)... the
> fact that the Bayes is making it time out, and then effectively timing
> out the rest of the stuff despite it probably being 'positive' in a lot
> of cases is proving far from jolly...

Hmmm... "sa-learn --force-expire --rebuild", for SA 2.61, seems to help
sometimes.  But that is soon to be history, replaced by another problem!
Executive warning:  If you were suffering from this problem, and are
thinking of moving to 2.62, then check the following beforehand.

At 2.62, the SA folk seem to have recognised the 2.61 "bayes_toks"
problem, and instead of "" are now using filename patterns
"bayes_toks.expire$$" (where $$ is the process id).  (Do a diff of the
2.61 and 2.62 versions of "lib/Mail/SpamAssassin/".)

BUT... the result is that instead of one huge "" file, there
now seem to be an increasing number of orphaned "bayes_toks.expire$$"
files.  (Given that $$ could typically span all integers up to 30,000, the
accumulating disk usage results could become 'interesting'...)

I realise such SA details take us somewhat off-topic from strict
MailScanner.  But has anyone here got any experience of this with SA 2.62,
or monitoring it on SA lists?  (Perhaps I need to rejoing an SA list or at
least ferret through their recent archives...)


:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:            South Road            :
:                                           Durham                :
:  Phone: +44 191 334 2752                  U.K.                  :

More information about the MailScanner mailing list