Recommendation on Bayes Score

Matt Kettler mkettler at EVI-INC.COM
Mon Aug 29 18:20:33 IST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Max Kipness wrote:
> I have my Bayes score set at the default 3.5. A good amount of spam is
> caught by my system using Bayes, SA rules, Razor, DCC, and Pyzor. However
> there is an increasing amount of spam getting is getting through. When I
> look at the headers of these particular emails, sometimes it might have a
> Razor score and/or DCC adding a bit to the score, and the Bayes is ALWAYS at
> 3.50. My score required for it to be considered spam is 6.
> 
> Should I raise the Bayes score and just make sure the Bayes databases are
> well tuned?

Before you do so, consider the following.

BAYES_99 represents a spam probability, as estimated by bayes, between 99 and
100% for the message.

This means that in a statistically perfect world, with perfect training, 0.5% of
the messages matching BAYES_99 will be nonspam messages.

Of course, the world isn't statistically perfect, nor will your training be
perfect. You might have more or less nonspam matching BAYES_99, but the
"statistical ideal" should give you a basis to make judgments from until you
take some actual measurements.


That said, if FNs are such a problem for you, why is your threshold set at 6
instead of the default 5? Raising the threshold suggests to me you're willing to
accept more FNs to avoid FPs. In fact, according to STATISTICS-set3.txt from SA
3.0.4, your FN rate should be 42% higher than the default 5.0. Of course, your
FP rate will also be 44% lower...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  29443  99.97%
# Correctly spam:      27220  97.53%
# False positives:         9  0.03%
# False negatives:       688  2.47%

# SUMMARY for threshold 6.0:
# Correctly non-spam:  29447  99.98%
# Correctly spam:      26931  96.50%
# False positives:         5  0.02%
# False negatives:       977  3.50%


If FPs are highly intolerable to you, I would be very cautious about bringing
BAYES_99's score all the way into the spam-tag range.

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!



More information about the MailScanner mailing list