Is this really how bayes+autolearn works?

Tue Dec 12 21:59:56 GMT 2006

My Bayes db seems to consistently start assigning BAYES_00 (-2.6) to
messages that are simple plain text messages you'd think would be easily
caught.  The messages are all seemingly almost identical.  They're
coming from bots, and only a small percentage of them are caught by
other SA rules.

Does that mean that they're being auto-learned as Ham and then
cancelling out my attempts to teach Bayes later that this is spam?  Out
of the many thousands that flood in, I'm only able to retrain bayes
using a small percentage (because only a small subset of my users drag
spam into the retraining system consistently).

The reason they're not caught by other SA rules is because they're
coming from bots.  Many of the samples I've looked at also wouldn't have
been caught John Rudd's Botnet plugin.

So Bayes is getting lots of messages that SA doesn't detect as spam, and
only a few similar messages that I train it to treat as spam.  Is this a
plausible explanation for why Bayes would consistently be misclassifying
this mail?

So far the floods start in the afternoon and the subject strings are
consistent enough that I'm able to correct the damage by:
    - removing my bayes database and retraining from archived spam
corpus (slow)
    - creating custom rules to, for example, filter out "Subject =~
/Good Morning/" (dangerous)

--
Trever Furnish, tgfurnish at herffjones.com
Herff Jones, Inc. Unix / Network Administrator
Phone: 317.612.3519
Any sufficiently advanced technology is indistinguishable from Unix.