Why is BAYES_00 -2.60 scoring low like this.

Anthony Peacock a.peacock at chime.ucl.ac.uk
Thu Feb 15 09:53:09 CET 2007


shuttlebox wrote:
> On 2/14/07, Renee Gehlbach <krgehlba at lexairinc.com> wrote:
>> Or better yet, use sa-learn to relearn any spam marked BAYES_00.  Or,
>> for even better results, any spam not scoring BAYES_99.  (While learning
>> suitable ham, too.)  The goal is not simply to lower the amount Bayes
>> filtering messes up your scoring when it's wrong, continuing to permit
>> it to assess spam incorrectly (if you don't want bayes to affect your
>> scores, why use up the resources it requires?), but instead to have it
>> actually correctly assess whether a message is in fact spam.
> 
> I agree with you on principle but to me Bayes is not as important as
> it used to be. With spammers using real text it's hard for it to do a
> good job. I would rather avoid the hassle of training it, to me it's
> not worth the effort but YMMV.

YMMV But I have to disagree with that sentiment.  Bayes is extremely 
accurate for our systems.  And feeding in the odd FP and FN takes no 
time or effort at all.

According to my MailWatch stats:

BAYES_99 hit 96.8% spam
BAYES_00 hit 89.1% ham

I keep the Bayes scores as per the distribution and in combination with 
the other rulesets I 99.5% of all spam on average.

-- 
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"If you have an apple and I have  an apple and we  exchange apples
then you and I will still each have  one apple. But  if you have an
idea and I have an idea and we exchange these ideas, then each of us
will have two ideas." -- George Bernard Shaw


More information about the MailScanner mailing list