Thoughts on new Bayes idea

Fri Jun 4 17:50:40 IST 2004

At 17:09 04/06/2004, you wrote:
>On Jun 4, 2004, at 8:08 AM, Max Kipness wrote:
>
>>I've been pondering this idea for a while, but wanted some opinions on
>>how feasible it would be...and the load it would cause.
>>
>>I currently have all users that receive spam that bypassed MailScanner,
>>simply forward the email to spam at ourdomain.com. The email then got
>>blacklisted and there was an option to put 'domain' in the subject header
>>to black list the entire domain. This worked well, but the black list got
>>up to around 1600 emails/domains and I started to get many SA time outs.
>>This was before implementing Bayes which is working great, if not too
>>good with false positives, but that's another story.
>>
>>My idea is to basically archive every email that enters the system
>>(through MS) for a period of a day or so. I've got a script that deletes
>>all emails older than a time specified from an mbox file. Then using my
>>script from above, have users forward the email to spam at ourdomain.com,
>>have a new script fetch that email out of the archive and feed it to Bayes.
>>
>>Any thoughts on this? Is it ridiculous?
>>
>>Most of my users are on various Exchange servers, and there really is no
>>easy way to get the email fed into bayes. I know you can do a public
>>folder, but then you have to train each user how to get it there, and
>>they have to open the public folder tree, etc. Using IMAP is even more
>>administration. I've found that simply forwarding the email somewhere is
>>very easy for them.
>
>My main concern about these sorts of schemes is that: one man's trash is
>another man's treasure.  As the size of your user base increases, it is
>inevitable that you will have users who have different opinions about
>where to draw the line between spam and ham (or even users who are
>fanatical about even identifying organization wide announcements as spam,
>or who are fanatical about preventing censorship and thus not wanting ANY
>message to be marked as spam).
>
>As a result, I tend to avoid any mechanism in which the user directly
>contributes to a site-wide configuration (side wide black lists, site wide
>bayes DB, etc.).  Indirect contributions by submitting messages for human
>review is fine (though, that gets into problems of spending all of some
>sysadmin's time reviewing spam), but the user should never directly say
>"learn this as spam/ham" for the site-wide database.

I agree that it is not an ideal solution for all organisations. But it
would be very useful for many people. Whether to adopt a scheme like this
is a management policy decision, not a technical one.

Organisation-wide announcements are easily handled by whitelisting them. In
my own setup (which is obviously not ideal for everyone) I stop
announcements being marked as spam, but I do impose a very strict size
limit on them. This keeps most of my users happy most of the time.

--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654

-------------------------- MailScanner list ----------------------
To leave, send    leave mailscanner    to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/     and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html