Is this really how bayes+autolearn works?
Scott Silva
ssilva at sgvwater.com
Tue Dec 12 22:45:11 GMT 2006
Furnish, Trever G spake the following on 12/12/2006 1:59 PM:
> My Bayes db seems to consistently start assigning BAYES_00 (-2.6) to
> messages that are simple plain text messages you'd think would be easily
> caught. The messages are all seemingly almost identical. They're
> coming from bots, and only a small percentage of them are caught by
> other SA rules.
>
> Does that mean that they're being auto-learned as Ham and then
> cancelling out my attempts to teach Bayes later that this is spam? Out
> of the many thousands that flood in, I'm only able to retrain bayes
> using a small percentage (because only a small subset of my users drag
> spam into the retraining system consistently).
>
> The reason they're not caught by other SA rules is because they're
> coming from bots. Many of the samples I've looked at also wouldn't have
> been caught John Rudd's Botnet plugin.
>
> So Bayes is getting lots of messages that SA doesn't detect as spam, and
> only a few similar messages that I train it to treat as spam. Is this a
> plausible explanation for why Bayes would consistently be misclassifying
> this mail?
>
> So far the floods start in the afternoon and the subject strings are
> consistent enough that I'm able to correct the damage by:
> - removing my bayes database and retraining from archived spam
> corpus (slow)
> - creating custom rules to, for example, filter out "Subject =~
> /Good Morning/" (dangerous)
>
I also see a lot of spam coming from bots, but I consistently catch most of
it. Are you using some good add-on rules?
Do you have any samples that some of us could run through our systems to see
what we get?
--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
More information about the MailScanner
mailing list