Changes in Bayes

Stephen Swaney steve.swaney at FSL.COM
Mon Mar 8 01:28:27 GMT 2004


> -----Original Message-----
> From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK] On
> Behalf Of Hendrik den Hartog
> Sent: Sunday, March 07, 2004 7:05 PM
> To: MAILSCANNER at JISCMAIL.AC.UK
> Subject: Changes in Bayes
>
> I'm currently using 4:28:3 . I'm aware that there have been several
> changes in the way Bayes 'works' and is managed. I'd appreciate some
> overview on the following.
>
>  My / directory filled up to 100%. In desperation I emptied the
>  bayes_journal, msgcount, seen, toks, toks.new files and this
>  freed up the space again.

The path to the bayes directory may be set in spam.assassin.prefs.conf by
adding a line similar to:

bayes_path <directory_where_bayes_tokens_are_stored>/bayes

Please see: http://www.spamassassin.org/doc/Mail_SpamAssassin_Conf.html

For the details. Please note the "/bayes" after the actual directory name.
This causes SpamAssassin to look for files "bayes_*" in the named directory.

>
>  Q's - do I now just wait for the bayes files to build up again?
>  or do I need to run some commands?

The default in SpamAssassin is to automatically add tokens to the bayes
database.

I quote from the link referenced above:

----Start quote -------

bayes_auto_learn ( 0 | 1 ) (default: 1)

Whether SpamAssassin should automatically feed high-scoring mails (or
low-scoring mails, for non-spam) into its learning systems. The only
learning system supported currently is a naive-Bayesian-style classifier.
Note that certain tests are ignored when determining whether a message
should be trained upon: - auto-whitelist (AWL) - rules with tflags set to
'learn' (the Bayesian rules) - rules with tflags set to 'userconf' (user
white/black-listing rules, etc)

Also note that auto-training occurs using scores from either scoreset 0 or
1, depending on what scoreset is used during message check. It is likely
that the message check and auto-train scores will be different.


bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)

The score threshold below which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a non-spam message.

bayes_auto_learn_threshold_spam n.nn (default: 12.0)

The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.
Note: SpamAssassin requires at least 3 points from the header, and 3 points
from the body to auto-learn as spam. Therefore, the minimum working value
for this option is 6.

----End quote -------

Auto learn is on by default but these settings and scores can be explicitly
set or changed in by adding a "parameter value" in spam.assassin.prefs.conf
and reloading MailScanner.

>
>      - Are there any settings in the new conf which can assist in
>  managing the file size for the bayes files?
>

I run a script in /etc/cron.daily called bayes-rebuild. The contents of
bayes-rebuild are simply:

-----Snip ------
#! /bin/bash
# rebuild the bayes database daily
/usr/bin/sa-learn --rebuild --force-expire
-----Snip ------

I also set

Rebuild Bayes Every = 0

In MailScanner.conf since it's done from the cron.daily job. I have had zero
bayes problems with this setup and bayes works very well with almost no
effort and no maintenance headaches.

Is it better to manually check and feed ham and spam to the Bayesian
database? Absolutely! - But if you're too busy to do that, this setup will
still improve overall spam detection.

>      - Is it safe to use symbolic links to alternative locations for
>  these files?

You are able to move the bayes_* files and reference them as shown above.
Symbolic links should not be necessary and will only waste system time.

>
> Any help/advice appreciated.
> Cheers!
> Hendrik
>

No problem.

Steve

Stephen Swaney
President
Fortress Systems Ltd.
Steve.Swaney at FSL.com



--
This message has been scanned for viruses and
dangerous content by Fortress Secure Mail Gateway
and was found to be clean.

Fortress Systems Ltd. - http://www.fsl.com



More information about the MailScanner mailing list