Bayesian shenanigans (i.e. problems)
David Lee
t.d.lee at DURHAM.AC.UK
Tue Jan 20 10:00:18 GMT 2004
On Fri, 16 Jan 2004, Peter Bates wrote:
> > mkettler at EVI-INC.COM 15/01/04 15:56:50 >>>
> At 05:38 AM 1/15/2004, David Lee wrote:
> >"Me, too!" (bayes_toks ~ 50MB, bayes_toks.new ~ 1.4GB). Glad I'm not
> >alone.
> >Yes those sizes are unreasonable... It sounds like expiry is never
> running
> >on your system.
>
> >Try running expiry manualy using sa-learn --force-expire and see if
> it
> >clears things up.
>
> Well, I've done a --force-expire, and got:
>
> -rw-r--r-- 1 postfix postfix 40M Jan 16 14:51 bayes_seen
> -rw------- 1 postfix postfix 123k Jan 16 14:51 bayes_journal
> -rw------- 1 postfix postfix 265M Jan 16 14:51 bayes_toks
> -rw------- 1 postfix postfix 2.7G Jan 16 13:08 bayes_toks.new
> -rw-r--r-- 1 postfix postfix 4.8M Oct 15 09:22 old_bayes_seen
> -rw-r--r-- 1 postfix postfix 22M Oct 15 09:22 old_bayes_toks
>
> now... and my SA/MS is timing out once again, now I've re-enabled Bayes
> with use_bayes...
>
> I'm almost tempted to have a normal SA run without Bayes, and then use
> MCP to reprocess the message again with Bayes (or vice versa)... the
> fact that the Bayes is making it time out, and then effectively timing
> out the rest of the stuff despite it probably being 'positive' in a lot
> of cases is proving far from jolly...
Hmmm... "sa-learn --force-expire --rebuild", for SA 2.61, seems to help
sometimes. But that is soon to be history, replaced by another problem!
Executive warning: If you were suffering from this problem, and are
thinking of moving to 2.62, then check the following beforehand.
At 2.62, the SA folk seem to have recognised the 2.61 "bayes_toks"
problem, and instead of "bayes_toks.new" are now using filename patterns
"bayes_toks.expire$$" (where $$ is the process id). (Do a diff of the
2.61 and 2.62 versions of "lib/Mail/SpamAssassin/BayesStore.pm".)
BUT... the result is that instead of one huge "bayes_toks.new" file, there
now seem to be an increasing number of orphaned "bayes_toks.expire$$"
files. (Given that $$ could typically span all integers up to 30,000, the
accumulating disk usage results could become 'interesting'...)
I realise such SA details take us somewhat off-topic from strict
MailScanner. But has anyone here got any experience of this with SA 2.62,
or monitoring it on SA lists? (Perhaps I need to rejoing an SA list or at
least ferret through their recent archives...)
--
: David Lee I.T. Service :
: Systems Programmer Computer Centre :
: University of Durham :
: http://www.dur.ac.uk/t.d.lee/ South Road :
: Durham :
: Phone: +44 191 334 2752 U.K. :
More information about the MailScanner
mailing list