Bayes setup

Tue Apr 8 20:19:15 IST 2003

It's 200 not 600.

Steve Evans
SDSU Foundation
(619) 594-0653  

-----Original Message-----
From: Julian Field [mailto:mailscanner at ECS.SOTON.AC.UK] 
Sent: Tuesday, April 08, 2003 6:35 AM
To: MAILSCANNER at JISCMAIL.AC.UK

At 11:00 08/04/2003, you wrote:
>Greetings:
>
>I am running SpamAssassin 2.52 in MailScanner, and I've also been 
>following the discussions of the SpamBayes project fairly closely for 
>some months. One of the crucial elements of Bayesian detection is 
>training, but I don't see any place that documents how to get ham and 
>spam messages routed back to the server for training.
>
>Is there some documentation? Am I just missing it by installing 
>SpamAssassin from cpan and MailScanner from RPMs?

There are 2 parts to the answer to this:

1) You can set up a "spam" and a "notspam" email address for people to
dump wrongly categorised mail into. You then use sa-learn once every
hour (or
day) to teach SpamAssassin about the messages it got wrong. I have
already posted a script to do this to this list, but have attached it
again for you.

2) SpamAssassin is unique in being able to "auto-learn", i.e. teach
itself.
It uses its other traditional rules to produce a score for each message.
If the score is very high (i.e. definitely spam) or very low (i.e.
definitely
ham) then it feeds the message back into the learning code for the Bayes
engine. It only starts using the Bayes engine output as part of the
overall message score once it has auto-learned about 600 messages (I
might well be wrong on that figure, but it's a few hundred).