bayes

Thu Feb 12 22:06:05 GMT 2004

At 04:53 PM 2/12/2004, Joe Stuart wrote:
>I just set up bayes like explained in the Mailscanner faq. With two
>email boxes named spam and notspam.  Then I have a cronjob that runs the
>sa-learn script on them.  My plan is to have the users send their email
>to the respected mailboxes. My only concern is that lets say I send 100
>messages to spam at mydomain.com will the filters start to think that mail
>coming from me is spam, or is the sa-learn script smarter than that?
>Also, is that the way others are using bayes in an organizational
>setting?

No, sa-learn is NOT smarter than that. But it's not as dumb as learning
"mail from xxx is spam"

SA's bayes engine tokenizes headers.. message ID patterns, mime boundary
patterns, all kinds of things. It will wind up learning "any message that
looks like a forwarded by this mail client is spam" as a result.

It's 100% impossible for SA to ever bayes_learn from a generic forwarded
message. Most forwards have lost their headers, had text added to the body,
HTML re-encoded by the client, attachments stripped, multipart/alternatives
covnerted to singlepart or vice-versa, etc.

SA MUST be fed a message that has original, unmangled body, and the
original headers included. Anything else will poison your bayes database.

You _can_ however forward the message (with complete headers) as an
attachment, and have a cron job that extracts the attachments and feeds
them to sa-learn.