Lots of spam gets through because of BAYES_00 -2.60

Gareth list-mailscanner at linguaphone.com
Wed Sep 12 13:56:50 IST 2007


Bayes does work well for us. It just does not work quite as well as my
home system since that is trained on every mail and even if something is
identified as spam and bayes is 80% certain I retrain it just to
reinforce the result.

Thats not practical in a company which gets as much mail as we do.

Mailscanner stats for the last 30 days.
identified spams - 13487
BAYES_99 - 9970
BAYES_50 - 1422
BAYES_80 - 635
BAYES_95 - 631
BAYES_60 - 494
BAYES_00 - 254
BAYES_20 - 100
BAYES_40 - 95
BAYES_05 - 46

I think thats good for a single rule. I just dont believe bayes not
thinking it is spam is a good reason to give such a high negative score.

Daily we use RBLs to reject mail to over 5000 recipients. Of whats left
we get 1000 mails a day of which about half is spam. In the last week
two spams have got through and we have had one false positive. We are
obviously doing something right to get such good results.

On Wed, 2007-09-12 at 13:38, Greg Matthews wrote:
> Gareth wrote:
> > Personally I find that it is very difficult to make bayes particularly 
> > effective in a corporate enviroment because of the variety of mails 
> 
> this is not a reflection on the usefulness of Bayes. Proper 
> configuration will make this an extremely useful part of the anti-spam 
> suite.
> 
> > people receive. Therefore I find the low scoring bayes rules give a far 
> > to big a negative score.  I tend to overise the low and high scores with 
> > the following :-
> >  
> > score BAYES_00 -0.5
> > score BAYES_05 -0.1
> > score BAYES_20 -0.01
> > score BAYES_40 -0.01
> > score BAYES_99  5.0
> > 
> 
> interesting, your high-end scores aren't as conservative as your low 
> end. I wonder if you are managing to auto-learn enough ham? You know you 
> can adjust the autolearn thresholds dont you? Its quite common for Bayes 
> to have far more spam to learn from than ham which without attention 
> results in having to skew the scores as you have above.
> 
> Personally, I have great success with Bayes on relays that filter around 
> 20-30k messages per day across 20-30 domains and around 5000 mailboxes. 
> I am careful tho to feed back all false postives flagged up by users 
> (perhaps as many as 5 per week) back into the system. I also feed back 
> all my own (personal) false negatives which may be as many as 10 per 
> week (<1% of my mail).
> 
> In summary, if Bayes is not working for you, its worth taking the time 
> to get it right rather than simply skewing the scores.
> 
> -- 
> Greg Matthews           01491 692445
> Head of UNIX/Linux, iTSS Wallingford
> 
> -- 
> This message (and any attachments) is for the recipient only. NERC
> is subject to the Freedom of Information Act 2000 and the contents
> of this email and any reply you make may be disclosed by NERC unless
> it is exempt from release under the Act. Any material supplied to
> NERC may be stored in an electronic records management system.



More information about the MailScanner mailing list