Recommendation on Bayes Score
Matt Kettler
mkettler at EVI-INC.COM
Mon Aug 29 18:20:33 IST 2005
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "US-ASCII" character set. ]
[ Some characters may be displayed incorrectly. ]
Max Kipness wrote:
> I have my Bayes score set at the default 3.5. A good amount of spam is
> caught by my system using Bayes, SA rules, Razor, DCC, and Pyzor. However
> there is an increasing amount of spam getting is getting through. When I
> look at the headers of these particular emails, sometimes it might have a
> Razor score and/or DCC adding a bit to the score, and the Bayes is ALWAYS at
> 3.50. My score required for it to be considered spam is 6.
>
> Should I raise the Bayes score and just make sure the Bayes databases are
> well tuned?
Before you do so, consider the following.
BAYES_99 represents a spam probability, as estimated by bayes, between 99 and
100% for the message.
This means that in a statistically perfect world, with perfect training, 0.5% of
the messages matching BAYES_99 will be nonspam messages.
Of course, the world isn't statistically perfect, nor will your training be
perfect. You might have more or less nonspam matching BAYES_99, but the
"statistical ideal" should give you a basis to make judgments from until you
take some actual measurements.
That said, if FNs are such a problem for you, why is your threshold set at 6
instead of the default 5? Raising the threshold suggests to me you're willing to
accept more FNs to avoid FPs. In fact, according to STATISTICS-set3.txt from SA
3.0.4, your FN rate should be 42% higher than the default 5.0. Of course, your
FP rate will also be 44% lower...
# SUMMARY for threshold 5.0:
# Correctly non-spam: 29443 99.97%
# Correctly spam: 27220 97.53%
# False positives: 9 0.03%
# False negatives: 688 2.47%
# SUMMARY for threshold 6.0:
# Correctly non-spam: 29447 99.98%
# Correctly spam: 26931 96.50%
# False positives: 5 0.02%
# False negatives: 977 3.50%
If FPs are highly intolerable to you, I would be very cautious about bringing
BAYES_99's score all the way into the spam-tag range.
------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).
Support MailScanner development - buy the book off the website!
More information about the MailScanner
mailing list