Ideas for improved bayes learning

Wed Sep 19 10:50:12 IST 2007

> -----Original Message-----
> From: mailscanner-bounces at lists.mailscanner.info [mailto:mailscanner->
> > > bounces at lists.mailscanner.info] On Behalf Of Gareth
> Sent: 19 September 2007 09:57
> To: MailScanner discussion
> Subject: Ideas for improved bayes learning
>
> Bayes normally autolearn a mail as being spam if the score is over 20.
> This is configurable.
> Many of us use RBLs on the MTA to reject known spam.
>
> I was thinking that it might be usefull to instead of rejecting the
RBL
> mail, to accept it, train bayes using it and then discard it.
>
> However I believe that the RBL checks that spamassassin perform are on
> all the received lines and not just the IP address our mail servers
> received the mail from?
> If that is correct then I cannot simply assign a high score to the RBL
> checks and have mailscanner delete very high scoring mail. 
>
> Ideally what I was thinking would for a couple of enhancements to
> Mailscanner :-
>
> 1) Add a new action of sa-learn-spam so the mail can be learnt. You
can
> use a custom rule to fire this if a RBL matches so the mail is learnt
> and then deleted.

In theory this is a great idea however in practice you do find from time
to time that mail servers which are on black lists are not just sending
spam, for example a mis configured mail server acting as an open relay
relaying both spam and ham, this would result in ham being fed into the
spam bayes.

> 2) Incorporate MailScanners RBL feature (I assume this one only checks
> one received header) into the rules which can be used when writing a
> custom action.
>
> Its only an idea and not a request for the new feature. Personally
> MailScanner is working very well for us so at this time it is not
worth
> allowing all the extra mail in just to improve the bayes effectivness.

I've set up a sort of custom block list on our own mailscanner servers
in conjunction with mailwatch, a script runs every few minutes pulling
all the client ips from maillog where the total number of highspam is
the same as the total number of messages, I.e senders who have only sent
high scoring spam and dump these into a second sql table. Then as I'm
using the postfix-mysql MTA in smtpd_client_restrictions I have a check
which queries this table, if the client is in here then they are
rejected at the MTA level, feel free to mail me off list and I'll be
happy to share my scripts and help that does this. You may be able to
integrate bits of this with improving the auto learn (so high spam from
a sender that has only sent high spam gets fed)

-- 
MailScanner mailing list
mailscanner at lists.mailscanner.info
http://lists.mailscanner.info/mailman/listinfo/mailscanner

Before posting, read http://wiki.mailscanner.info/posting

Support MailScanner development - buy the book off the website! 

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error please notify the
system manager. This message contains confidential information and is
intended only for the individual named. If you are not the named
addressee you should not disseminate, distribute or copy this e-mail.

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.